Introduction
This page hosts documentations and specifications for some of the cryptographic algorithms of Mina. For the Rust documentation, see here.
Note that this book is a work in progress, not necessarily reflecting the current state of Mina.
Authors: Izaak Meckler, Vanishree Rao, Matthew Ryan, Anaïs Querol, Joseph Spadavecchia, David Wong
In memory of Vitaly Zelov.
Terminology
This document is intended primarily to communicate mathematical ideas to programmers with some experience of math.
To that end, we will often be ambivalent about the difference between sets and types (in whatever programming language, but usually I am imagining some idealized form of Rust or OCaml). So for example we may write
-
is a type
-
is a set
-
Ais a type
and these should all be interpreted the same way, assuming it makes sense in the context. Similarly, we may write either of the following, and potentially mean the same thing:
-
-
-
a : A
We use
- for function types
- for defining functions
Also, we usually assume functions are computed by a program (in whatever sense of “program” makes sense in the given context). So if we say “let ”, we typically mean that
-
and are types in some programming language which makes sense in the context (or maybe that they are sets, again depending on context)
-
is actually a program computing a function of the type (again, in whatever sense of program makes sense in context)
Groups
Groups are an algebraic structure which
-
are used in most widely deployed signature schemes
-
are used in most widely deployed zk-SNARKs
-
generalize the notion of a collection of symmetries of an object
-
generalize certain aspects of numbers
I will not touch on the last two too much.
First let’s do a quick definition and then some examples.
Definition. A group is a set with a binary operation , an identity for that operation, and a inverse map such that
-
is associative: for all .
-
is an identity for . I.e., for all .
-
is an inverse map for with identity . I.e., for all .
So basically, an invertible binary operation. Definition in hand, we can see some examples:
-
The integers with , as identity, and negation as inverse.
-
For any natural number , the integers mod . This can be thought of as the set of numbers with the operation being followed by taking remainder mod . It can also be thought of as the group of rotations of an -gon.
-
For any set , the set of invertible functions (a.k.a permutations) with function composition as the operation, the identity function as the identity, and function inverse as the inverse operation.
-
The set of translations of a plane with composition of translations (since the composition of translations is a translation). The identity is translating by and inverse is reversing the translation. This is equivalent to the group with coordinate-wise addition as the operation and coordinate-wise negation as the inverse.
-
The set of rotations of a sphere with composition as the operation, doing nothing as the identity, and performing the rotation in reverse as the inverse map.
-
For any field , the set , with field multiplication as the operation, as the identity, and as the inverse map. This is called the group of units of .
Sometimes, instead of we simply write when is clear from context.
Abelian groups
An abelian group is a group in which is commutative, meaning . Typically, in that case, we write the group operation as instead and we write the identity as .
Some examples are
-
the integers with addition
-
the integers mod with addition
-
the group of units of a field
-
the real numbers with addition
-
the rational numbers with addition
-
the real numbers with multiplication
-
the rational numbers with multiplication
-
vectors of integers of some fixed dimension with coordinate-wise addition
-
The set of polynomials over a ring , with addition
Abelian groups are equivalently described as what are called -modules. A -module is like a vector space, but with the integers instead of a field.
Basically a -module (or equivalently an Abelian group) is a structure where you can add elements and where it makes sense to multiply a group element by an integer.
If is an abelian group, we can define this “multiplication by an integer” as follows. If , then for , we can define by
and if , then define . Or equivalently.,
This is the general sense of what is called scalar-multiplication or sometimes exponentiation in cryptography.
Cyclic groups
A cyclic group is a special kind of abelian group. It is an abelian group generated by a single element . That is, a cyclic group (generated by ) is one in which for every we have for some .
Groups in cryptography
Many cryptographic protocols are defined generically with respect to any abelian group that has a “hard discrete log”.
Let
-
be a cyclic group
-
a generator of
-
a probability distribution on
-
a set of algorithms of type with runtime and memory usage bounds. In other words, a set of tuples where is a program of type , is a bound on the time that program can be run, and is a bound on the amount of memory that program can use.
In practice you fix this to be something like, the set of all computations that can be run for less than 1 trillion $.
-
a probability, usually taken to be something close to 0 like
Then we can define the -computational discrete-log assumption which says:
For any , if you sample h from G according to , then the probability that is smaller than , assuming successfully runs within the given resource bounds.
Sometimes this is called the computational Diffie–Helman assumption. Basically what it’s saying is that for a group element sampled randomly, you can’t practically compute how many times you have to add to itself to get .
Another really important assumption is the no-relation assumption (TODO: name).
Basically what this says is that randomly selected group elements have no efficiently computable linear relations between them. Formally, letting be a group and a probability distribution on , and a set of programs (with resource bounds) that take in a list of group elements as inputs and outputs a list of integers of the same length.
Then the -no-relation assumption says for all , for any , if you sample according to , letting , it is not the case that
except with probability (assuming program runs in time with memory ).
Generic group model
Elliptic curves
Now, what are some concrete groups that we can safely make the no-relation or computational discrete log hardness assumptions about?
Well, the most efficiently implementable groups that people have come up with – and that we believe satisfy the above assumptions for being the class of realistic computations and being something like – are elliptic curves over finite fields.
Giving a complete definition of what an elliptic curve is requires a lot of math, and is not very useful from the point of view of cryptography. So we will give a definition that is not complete, but more useful.
An elliptic curve over a field is a set of the form
for some , plus an additional point which is called the point at infinity. Basically it’s a set of pairs satisfying some equation of that form. The data of the elliptic curve itself is just the field together with the constants and .
What is special about elliptic curves is that there is a way of giving them a group structure with simple to compute operations that we believe satisfy the hardness assumptions described above.
Group negation is defined by
so we just negate the coordinate.
The identity for the group is , the point at infinity. For that point we may also therefore write it as .
Group addition is more complicated to define, so we will not, but here is what’s worth knowing about how to compute
-
There are three cases
-
-
and
-
but . In this case it turns out and so the two points are inverse, and we return
In cases 1 and 2, the algorithm to compute the result just performs a simple sequence of some field operations (multiplications, additions, divisions, subtractions) with the input values. In other words, there is a simple arithmetic formula for computing the result.
-
Elliptic curves in Rust
Elliptic curves in Sage
Serializing curve points
Given a curve point we know and thus is one of the two square roots of . If is a given square root, the other square root is since . So, if we know , then we almost know the whole curve point: we just need a single bit to tell us which of the two possible values (i.e., the two square roots of ) is the y coordinate.
Here is how this is commonly done over prime order fields , assuming . Remember that we represent elements of as integers in the range . In this representation, for a field element , is the integer (or if ). Thus, if is odd, then is even (since is odd and an odd minus an odd is even). Similarly, if is even, then is odd (since an odd minus an even is odd).
So, for any , unless is 0, and have opposite parities. Parity can easily be computed: it’s just the least significant bit. Thus, we have an easy way of encoding a curve point . Send
-
in its entirety
-
The least significant bit of
Given this, we can reconstruct as follows.
-
Compute any square root of
-
If the least significant bit of is equal to , then , otherwise,
In the case of Mina, our field elements require 255 bits to store. Thus, we can encode a curve point in bits, or 32 bytes. At the moment this optimized encoding is not implemented for messages on the network. It would be a welcome improvement.
Projective form / projective coordinates
The above form of representing curve points as pairs – which is called affine form – is sometimes sub-optimal from an efficiency perspective. There are several other forms in which curve points can be represented, which are called projective forms.
The simple projective form represents a curve as above as the set
If you think about it, this is saying that is a point on the original curve in affine form. In other words, in projective form we let the first two coordinates get scaled by some arbitrary scaling factor , but we keep track of it as the third coordinate.
To be clear, this means curve points have many different representations. If is a curve point in projective coordinates, and is any element of , then is another representation of the same curve point.
This means curve points require more space to store, but it makes the group operation much more efficient to compute, as we can avoid having to do any field divisions, which are expensive.
Jacobian form / Jacobian coordinates
There is another form, which is also sometimes called a projective form, which is known as the jacobian form. There, a curve would be represented as
so the triple corresponds to the affine point . These are the fastest for computing group addition on a normal computer.
Take aways
-
Use affine coordinates when the cost of division doesn’t matter and saving space is important. Specific contexts to keep in mind:
- Working with elliptic curve points inside a SNARK circuit
-
Use Jacobian coordinates for computations on normal computers, or other circumstances where space usage is not that costly and division is expensive.
-
Check out this website for the formulas for implementing the group operation.
More things to know
- When cryptographers talk about “groups”, they usually mean a “computational group”, which is a group equipped with efficient algorithms for computing the group operation and inversion. This is different because a group in the mathematical sense need not have its operations be computable at all.
Exercises
-
Implement a type
JacobianPoint<F:Field>and functions for computing the group operation -
Familiarize yourself with the types and traits in
ark_ec. Specifically,- todo
-
Implement
fn decompress<F: SquareRootField>(c: (F, bool)) -> (F, F)
Rings
A ring is like a field, but where elements may not be invertible. Basically, it’s a structure where we can
-
add
-
multiply
-
subtract
but not necessarily divide. If you know what polynomials are already, you can think of it as the minimal necessary structure for polynomials to make sense. That is, if is a ring, then we can define the set of polynomials (basically arithmetic expressions in the variable ) and think of any polynomial giving rise to a function defined by substituing in for in and computing using and as defined in .
So, in full, a ring is a set equipped with
such that
-
gives the structure of an abelian group
-
is associative and commutative with identity
-
distributes over . I.e., for all .
Intro
The purpose of this document is to give the reader mathematical, cryptographic, and programming context sufficient to become an effective practictioner of zero-knowledge proofs and ZK-SNARKs specifically.
Some fundamental mathematical objects
In this section we’ll discuss the fundamental objects used in the construction of most ZK-SNARKs in use today. These objects are used extensively and without ceremony, so it’s important to understand them very well.
If you find you don’t understand something at first, don’t worry: practice is often the best teacher and in using these objects you will find they will become like the bicycle you’ve had for years: like an extension of yourself that you use without a thought.
Fields
A field is a generalization of the concept of a number. To be precise, a field is a set equipped with
-
An element
-
An element
-
A function
-
A function
-
A function
-
A function
Note that the second argument to cannot be . We write these functions in the traditional infix notation writing
-
or for
-
for
-
for
-
for
and we also write for and for .
Moreover all these elements and functions must obey all the usual laws of arithmetic, such as
-
-
-
-
-
-
-
if and only if , assuming .
-
-
In short, should be an abelian group over with as identity and as inverse, should be an abelian group over with as identity and as inverse, and addition should distribute over multiplication. If you don’t know what an abelian group is, don’t worry, we will cover it later.
The point of defining a field is that we can algebraically manipulate elements of a field the same way we do with ordinary numbers, adding, multiplying, subtracting, and dividing them without worrying about rounding, underflows, overflows, etc.
In Rust, we use the trait
Fieldto represent types that are fields. So, if we haveT : Fieldthen values of typeTcan be multiplied, subtracted, divided, etc.
Examples
The most familiar examples of fields are the real numbers and the rational numbers (ratios of integers). Some readers may also be friends with the complex numbers which are also a field.
The fields that we use in ZKPs, however, are different. They are finite fields. A finite field is a field with finitely many elements. This is in distinction to the fields , and , which all have infinitely many elements.
What are finite fields like?
In this section we’ll try to figure out from first principles what a finite field should look like. If you just want to know the answer, feel free to skip to the next section.
Let’s explore what a finite field can possibly be like. Say is a finite field. If is a natural number like or we can imagine it as an element of by writing .
Since is finite it must be the case that we can find two distinct natural numbers which are the same when interpreted as elements of .
Then, , which means the element is . Now that we have established that if you add to itself enough times you get , let be the least natural number such that if you add to itself times you get .
Now let’s show that is prime. Suppose it’s not, and . Then since in , . It is a fact about fields (exercise) that if then either is 0 or is 0. Either way, we would then get a natural number smaller than which is equal to in , which is not possible since is the smallest such number. So we have demonstrated that is prime.
The smallest number of times you have to add to itself to get 0 is called the characteristic of the field . So, we have established that every finite field has a characteristic and it is prime.
This gives us a big hint as to what finite fields should look like.
Prime order fields
With the above in hand, we are ready to define the simplest finite fields, which are fields of prime order (also called prime order fields).
Let be a prime. Then (pronounced “eff pee” or “the field of order p”) is defined as the field whose elements are the set of natural numbers .
-
is defined to be
-
is defined to be
-
-
-
-
Basically, you just do arithmetic operations normally, except you take the remainder after dividing by . This is with the exception of division which has a funny definition. Actually, a more efficient algorithm is used in practice to calculate division, but the above is the most succinct way of writing it down.
If you want, you can try to prove that the above definition of division makes the required equations hold, but we will not do that here.
Let’s work out a few examples.
2 is a prime, so there is a field whose elements are . The only surprising equation we have is
Addition is XOR and multiplication is AND.
Let’s do a more interesting example. 5 is a prime, so there is a field whose elements are . We have
where the last equality follows because everything is mod 5.
We can confirm that 3 is in fact by multiplying 3 and 2 and checking the result is 1.
so that checks out.
In cryptography, we typically work with much larger finite fields. There are two ways to get a large finite field.
-
Pick a large prime and take the field . This is what we do in Mina, where we use fields in which is on the order of .
-
Take an extension of a small finite field. We may expand this document to talk about field extensions later, but it does not now.
Algorithmics of prime order fields
For a finite field where fits in bits (i.e., ) we have
-
Addition, subtraction:
-
Multiplication
-
Division I believe, in practice significantly slower than multiplication.
Actually, on a practical level, it’s more accurate to model the complexity in terms of the number of limbs rather than the number of bits where a limb is 64 bits. Asymptotically it makes no difference but concretely it’s better to think about the number of limbs for the most part.
As a result you can see it’s the smaller is the better, especially with respect to multiplication, which dominates performance considerations for implementations of zk-SNARKs, since they are dominated by elliptic curve operations that consist of field operations.
While still in devolpment, Mina used to use a field of 753 bits or 12 limbs and now uses a field of 255 bits or 4 limbs. As a result, field multiplication became automatically sped up by a factor of , so you can see it’s very useful to try to shrink the field size.
Polynomials
The next object to know about is polynomials. A polynomial is a syntactic object that also defines a function.
Specifically, let be a field (or more generally it could even be a ring, which is like a field but without necessarily having division, like the integers ). And pick a variable name like . Then (pronounced, “R adjoin x” or “polynomials in x over R” or “polynomials in x with R coefficients”) is the set of syntactic expressions of the form
where each and is any natural number. If we wanted to be very formal about it, we could define to be the set of lists of elements of (the lists of coefficients), and the above is just a suggestive way of writing that list.
An important fact about polynomials over is that they can be interpreted as functions . In other words, there is a function defined by
where is just a variable name without a value assigned to it and maps it to a field element and evaluates the function as the inner product between the list of coefficients and the powers of .
It is important to remember that polynomials are different than the functions that they compute. Polynomials are really just lists of coefficients. Over some fields, there are polynomials and such that , but .
For example, consider in the polynomials and . These map to the same function (meaning for it holds that ), but are distinct polynomials.
Some definitions and notation
If is a polynomial in and , we write for
If is a polynomial, the degree of is the largest for which the coefficient of is non-zero in . For example, the degree of is 10.
We will use the notation and for the set of polynomials of degree less-than and less-than-or-equal respectively.
Polynomials can be added (by adding the coefficients) and multiplied (by carrying out the multiplication formally and collecting like terms to reduce to a list of coefficients). Thus is a ring.
Fundamental theorem of polynomials
An important fact for zk-SNARKs about polynomials is that we can use them to encode arrays of field elements. In other words, there is a way of converting between polynomials of degree and arrays of field elements of length .
This is important for a few reasons
-
It will allow us to translate statements about arrays of field elements into statements about polynomials, which will let us construct zk-SNARKs.
-
It will allow us to perform certain operations on polynomials more efficiently.
So let’s get to defining and proving this connection between arrays of field elements.
The first thing to understand is that we won’t be talking about arrays directly in this section. Instead, we’ll be talking about functions where is a finite subset of . The idea is, if has size , and we put an ordering on , then we can think of a function as the same thing as an immutable array of length , since such a function can be thought of as returning the value of an array at the input position.
With this understanding in hand, we can start describing the “fundamental theorem of polynomials”. If has size , this theorem will define an isomorphism between functions and , the set of polynomials of degree at most .
Now let’s start defining this isomorphism.
One direction of it is very easy to define. It is none other than the evaluation map, restricted to :
We would now like to construct an inverse to this map. What would that mean? It would be a function that takes as input a function (remember, basically an array of length ), and returns a polynomial which agrees with on the set . In other words, we want to construct a polynomial that interpolates between the points for .
Our strategy for constructing this polynomial will be straightforward. For each , we will construct a polynomial that is equal to when evaluated at , and equal to when evaluated anywhere else in the set .
Then, our final polynomial will be . Then, when is evaluated at , only the term will be non-zero (and equal to as desired), all the other terms will disappear.
Constructing the interpolation map requires a lemma.
Lemma. (construction of vanishing polynomials)
Let . Then there is a polynomial of degree such that evaluates to on , and is non-zero off of . is called the vanishing polynomial of .
Proof. Let . Clearly has degree . If , then since is one of the terms in the product defining , so when we evaluate at , is one of the terms. If , then all the terms in the product are non-zero, and thus is non-zero as well.
Now we can define the inverse map. Define
Since each has degree , this polynomial has degree . Now we have, for any ,
Thus is the identity.
So we have successfully devised a way of interpolating a set of points with a polynomial of degree .
What remains to be seen is that these two maps are inverse in the other direction. That is, that for any polynomial , we have
This says that if we interpolate the function that has the values of on , we get back itself. This essentially says that there is only one way to interpolate a set of points with a degree polynomial. So let’s prove it by proving that statement.
Lemma: polynomials that agree on enough points are equal.
Let be a field and suppose have degree at most . Let have size .
Suppose for all , . Then as polynomials.
Proof. Define . Our strategy will be to show that is the zero polynomial. This would then imply that as polynomials. Note that vanishes on all of since and are equal on .
To that end, let . Then we can apply the polynomial division algorithm to divide by and obtain polynomials such that and has degree less than 1. I.e., is a constant in .
Now, so and thus .
Note that is 0 on all elements with since , but .
Thus, if we iterate the above, enumerating the elements of as , we find that
Now, if is not the zero polynomial, then will have degree at least since it would have as a factor the linear terms . But since both have degree at most and , has degree at most as well. Thus, , which means as well.
This gives us the desired conclusion, that as polynomials.
Now we can easily show that interpolating the evaluations of a polynomial yields that polynomial itself. Let . Then is a polynomial of degree at most that agrees with on , a set of size . Thus, by the lemma, they are equal as polynomials. So indeed for all .
So far we have proven that and give an isomorphism of sets (i.e., a bijection) between the sets and .
But we can take this theorem a bit further. The set of functions can be given the structure of a ring, where addition and multiplication happen pointwise. I.e., for we define and . Then we can strengthen our theorem to say
Fundamental theorem of polynomials (final version)
Let and let with . With
defined as above, these two functions define an isomorphism of rings.
That is is, they are mutually inverse and each one respects addition, subtraction and multiplication.
The fundamental theorem of polynomials is very important when it comes to computing operations on polynomials. As we will see in the next section, the theorem will help us to compute the product of degree polynomials in time , whereas the naive algorithm takes time . To put this in perspective, if , is times larger than and the gap only gets bigger as grows.
Schwartz–Zippel lemma
Computer representation
There are three common representations for polynomials used in computer implementations.
-
Dense coefficient form. A degree polynomial is represented as a vector of length of all the coefficients. Entry of the vector corresponds to the coefficient . This corresponds to the
DensePolynomialtype in arkworks. Sometimes this is also described as writing the polynomial “in the monomial basis”, because it amounts to writing the polynomial as a linear combination of the monomials . -
Sparse coefficient form. If a polynomial does not have very many non-zero coefficients, the above representation is wasteful. In sparse coefficient form, you represent a polynomial as a vector (or potentially a hash-map) of pairs
(usize, F)whereFis the type of coefficients. The polynomial corresponding to the list[(i_0, b_0), ..., (i_n, b_n)]is -
Evaluation form. We fix an index set , with , and represent a polynomial as the vector
[f(a_0), ..., f(a_d)]. By the fundamental theorem of polynomials, this is a valid way of representing the polynomials, since the coefficients form can be obtained by using the function.
The evaluation form is very important. The reason is that multiplying two polynomials in the evaluation form takes time . You just multiply the two vectors entry-wise. By contrast, the coefficient forms naively require time to multiply.
Now, there is a trick. For certain sets , we can efficiently translate between the dense coefficients form and the evaluation form. That is, for certain , the functions and can be computed more efficiently than .
Multiplying polynomials
The algorithm that allows us to multiply polynomials in is called the Cooley-Tukey fast Fourier transform, or FFT for short. It was discovered by Gauss 160 years earlier, but then separately rediscovered and publicized by Cooley-Tukey.
The heart of Cooley-Tukey FFT is actually about converting between coefficient and evaluation representations, rather than the multiplication itself. Given polynomials and in dense coefficient representation, it works like this.
- Convert and from coefficient to evaluation form in using Cooley-Tukey FFT
- Compute in evaluation form by multiplying their points pairwise in
- Convert back to to coefficient form in using the inverse Cooley-Tukey FFT
The key observation is that we can choose any distinct evaluation points to represent any degree polynomial. The Cooley-Tukey FFT works by selecting points that yield an efficient FFT algorithm. These points are fixed and work for any polynomials of a given degree.
The next section describes the Cooley-Tukey FFT in detail.
Fast Fourier Transform (FFT)
This section describes how the Cooley-Tukey fast Fourier transform works. As we learned in the previous section, the key is to select evaluation points that yield an efficient FFT algorithm.
Specifically, say we have such that , and for any .
Put another way, all the values are distinct and .
Put yet another way, the group generated by inside (written ) has size .
We call such an a primitive -th root of unity.
Suppose we have an which is a primitive th root of unity and let .
The FFT algorithm will let us compute for this set.
Actually, it is easier to see how it will let us compute the algorithm efficiently.
We will describe an algorithm that takes as input
- a primitive th root of unity
- in dense coefficients form (i.e., as a vector of coefficients of length ).
and outputs the vector of evaluations and does it in time (which is to say, if ).
Notice that naively, computing each evaluation using the coefficients of would require time , and so computing all of them would require time .
The algorithm can be defined recursively as follows.
If , then is a primitive st root of unity, and is a polynomial of degree . That means and also is a constant . So, we can immediately output the array of evaluations .
If , then we will split into two polynomials, recursively call on them, and reconstruct the result from the recursive calls.
To that end, define to be the polynomial whose coefficients are all the even-index coefficients of and the polynomial whose coefficients are all the odd-index coefficients of . In terms of the array representation, this just means splitting out every other entry into two arrays. So that can be done in time .
Write , so that and . Then
Now, notice that if is a th root of unity, then is a th root of unity. Thus we can recurse with and similarly for . Let
By assumption . So, for any we have
Now, since may be larger than , we need to reduce it mod , relying on the fact that if is an th root of unity then since . Thus, and so we have
We can compute the array in time (since each entry is the previous entry times ). Then we can compute each entry of the output in as
There are such entries, so this takes time .
This concludes the recursive definition of the algorithm .
Algorithm: computing
- the coefficients of polynomial
- the even coefficients of corresponding to
- the odd coefficients of corresponding to
Now let’s analyze the time complexity. Let be the complexity on an instance of size (that is, for ).
Looking back at what we have done, we have done
- for computing and
- two recursive calls, each of size
- for computing the powers of
- for combining the results of the recursive calls
In total, this is . Solving this recurrence yields . Basically, there are recursions before we hit the base case, and each step takes time .
Now, in practice there are ways to describe this algorithm non-recursively that have better concrete performance, but that’s out of scope for this document. Read the code if you are interested.
Using the FFT algorithm to compute
So far we have a fast way to compute all at once, where is the set of powers of a th root of unity . For convenience let .
Now we want to go the other way and compute a polynomial given an array of evaluations. Specifically, evaluations uniquely define a degree polynomial. This can be written as a system of equations
which can be rewritten as a matrix vector product. This matrix is a Vandermonde matrix and it just so happens that square Vandermonde matrices are invertible, iff the are unique. Since we purposely selected our to be the powers of , a primitive -th root of unity, by definition are unique.
Therefore, to compute the polynomial given the corresponding array of evaluations (i.e. interpolation) we can solve for the polynomial’s coefficients using the inverse of the matrix. All we need now is the inverse of this matrix, which is slightly complicated to compute. I’m going to skip it for now, but if you have the details please make a pull request.
Substituting in the inverse matrix we obtain the equation for interpolation. Observe that this equation is nearly identical to the original equation for evaluation, except with the following substitution. Consequently and perhaps surprisingly, we can reuse the FFT algorithm in order to compute the inverse– .
So, suppose we have an array of field elements (which you can think of as a function ) and we want to compute the coefficients of a polynomial with .
To this end, define a polynomial by . That is, the polynomial whose coefficients are the evaluations in our array that we’re hoping to interpolate.
Now, let .
That is, we’re going to feed into the FFT algorithm defined above with as the th root of unity. It is not hard to check that if is an n-th root of unity, so is . Remember: the resulting values are the evaluations of on the powers of , so .
Now, let . That is, re-interpret the values returned by the FFT as the coefficients of a polynomial. I claim that is almost the polynomial we are looking for. Let’s calculate what values takes on at the powers of .
Now, let’s examine the quantity . We claim that if , then , and if , then . The first claim is clear since
For the second claim, we will prove that . This implies that . So either or . The former cannot be the case since it implies which in turn implies which is impossible since we are in the case . Thus we have as desired.
So let’s show that is invariant under multiplication by . Basically, it will come down to the fact that .
So now we know that
So if we define , then for every as desired. Thus we have our interpolation algorithm, sometimes called an inverse FFT or IFFT:
Algorithm: computing
Input: the points we want to interpolate and a th root of unity.
Interpret the input array as the coefficients of a polynomial .
Let .
Output the polynomial . I.e., in terms of the dense-coefficients form, output the vector .
Note that this algorithm also takes time
Takeaways
-
Polynomials can be represented as a list of coefficients or a list of evaluations on a set
-
If the set is the set of powers of a root of unity, there are time algorithms for converting back and forth between those two representations
-
In evaluations form, polynomials can be added and multiplied in time
- TODO: caveat about hitting degree
Exercises
-
Implement types
DensePolynomial<F: FfftField>andEvaluations<F: FftField>that wrap aVec<F>and implement the FFT algorithms described above for converting between them -
Familiarize yourself with the types and functions provided by
ark_poly
Commitments
A “commitment scheme” is a cryptographic scheme that lets us provide a “commitment” to a given piece of data, in such a way that we can later “open” that commitment.
I recommend checking out section 2.4.1 of David’s book Real-World Cryptography. There are two properties we typically expect of our commitment schemes:
-
Hiding. A commitment to a piece of data should not reveal anything about that data.
-
Binding. There should be no ambiguity about what data is committed to. That is, it should not be possible for the committer to open the commitment to a piece of data other than the one they originally committed to.
There are various kinds of commitments that allow opening in different ways, revealing only part of the data, or some function of the data. Sometimes it is even useful to elide the hiding property entirely— so-called non-hiding commitments.
Simple commitments
The simplest kind of a commitment is one in which opening reveals all of the underlying data. Let’s give a simple construction of such a scheme for committing to field elements. Suppose we have
-
a prime order field, with being rather large (say on the order of ).
-
A hash function .
Then we define
The argument of the function is data that must only be known to the committer (until the commitment is opened). When a committer wants to commit to a field element , they sample a random “blinder” and hash it together with to form the commitment.
To open the commitment , they simply provide the committed value together with the blinder. Alternatively, if the verifier already knows , they can just provide , i.e. . Finally, given the commitment and the opening, we can verify whether the input was the value originally committed to using the function.
If the hash function is collision-resistant, then this function is binding (because there’s no way the committer could find another preimage of ).
If the hash function is one-way, then this commitment is also hiding (assuming is revealed only as part of the opening).
Commitments are often used in protocols between provers and verifiers. The following illustration provides an example with a prover named Peggy and a verifier named Victor.
Here Peggy commits to an input using a blinder, obtains the commitment and sends it to Victor. The interlocutors continue their protocol, but eventually to convince Victor of her claims, Peggy must send the opening proof to her earlier commitment. Victor verifies the opening (i.e. the input and blinder) against the commitment. If the verification fails, then Victor knows that Peggy was trying to trick him, otherwise Victor has sufficient assurances that Peggy was telling the truth.
Algebraic and homomorphic commitments
Instead of a cryptographic hash function, we can use elliptic curve scalar multiplication to construct a commitment scheme. Here scalar multiplication is used like a one-way function based on the hardness assumption of the elliptic curve discrete logarithm problem (ECDLP). Suppose we have
- a prime order field, with being large (e.g. something like ).
- Publicly agreed generator point over an elliptic curve
- Another publicly agreed curve point for which no one knows the discrete logarithm
where is the value being committed to, is a random blinding factor and the commitment is a curve point.
These commitments are algebraic (i.e. they do not use a boolean-based cryptographic hash function) and have homomorphic properties: you can add commitments together to form another commitment of the added committed values. For example, if you have commitments and , you can perform:
In other words, the sum of commitments and is equal to the commitment of the sum of the two committed values and and blinders and . This is possible because in such a scheme scaling is commutative with adding scalars.
As a cryptographic primitive, the ability to find a public curve point for which no one knows the discrete logarithm may, at first, seem rather mind-blowing and powerful.
Actually, it’s as easy as it is awesome to find such a point— simply perform rejection sampling by cryptographically hashing (or, respectively, the hash output), using the output as the -coordinate of a candidate point on and checking whether it’s valid. The first valid curve point obtained is and by the hardness assumption of the ECDLP, no one knows it.
Since approximately half of the hash outputs will be valid curve points on , sampling will terminate very quickly. Indeed, as we will see later, this process can be used to sample many public curve points for which the discrete logarithms are unknown; the so-called hash to curve algorithm.
Pedersen commitments
The homomorphic commitment described above is known as a Pedersen commitment. If you remove the term you get a non-hiding commitment, called a Pedersen hash. Both rely on the ECDLP hardness assumption.
This means that, at least theoretically, you might be lucky (or have a quantum computer) and figure out that , which would allow you to find different values and to open the commitment. We say that pedersen commitments are computationally binding and not unconditionally binding. For example, you could express alternatively as and compute a satisfying opening pair and .
On the other hand, Pedersen commitments are unconditionally hiding, as there is no way (even with a magic computer) to reveal what is without knowing . Lack of perfect binding is the reason why most of the “proofs” we will see later in this book are not referred to as proofs, but instead are referred to as arguments of knowledge (although we may care little about this distinction). Just remember that you need perfect binding to be called a proof.
Interestingly, it is impossible to have a commitment scheme that has both perfect hiding and perfect binding.
To recap, in cryptography the following distinctions are important
-
Perfect. The property that an algorithm is statistically sound without hardness assumptions, also known as unconditional or statistical soundness.
-
Computational. The algorithm relies on a hardness assumption or computational limitation for soundness.
Thus, said another way, Pedersen commitments provide perfect hiding and computational binding.
Vector commitments
We can commit to several values by sending separate Pedersen commitments to all of these values as such:
But we can instead batch/aggregate all of these commitments together as a single commitment:
with independent bases with unknown discrete logarithms.
If you represent s and the s as two vectors and , we can quickly write the previous statement as an inner product
Vector commitments (sometimes referred to as multi-commitments) are a powerful construction because an arbitrarily large vector can be committed with a single curve point.
The naive approach to constructing an opening proof for a length vector commitment has size . It is simply the tuple . As we will see later, opening proofs for vector commitments is an interesting topic and there is a much more efficient algorithm.
Polynomial commitments
To construct SNARKs we use use polynomial commitments. A polynomial commitment scheme for a field (or it could even be a ring) is a way of committing to a polynomial to get a commitment , in such a way that for any , you can provide , along with an “opening proof” that proves that the polynomial committed to in equals when evaluated at .
In other words, it is a type of commitment , a type of randomness , a type of opening proof along with algorithms
such that for any , we have
and if then it is not possible to compute such that
In other words, if then every which is feasible to compute results in .
One thing that’s pretty cool is that because polynomial commitment schemes let you construct zk-SNARKs, polynomial commitment schemes imply commitment schemes with arbitrary opening functionality. TODO
Constructing polynomial commitment schemes
All known constructions of polynomial commitment schemes are a bit complicated. The easiest to describe is called the Kate (pronounced “kah-TAY”) scheme, also known as “KZG”. It requires a “prime-order group with a pairing”, which is three groups of prime order (hence, all isomorphic cyclic groups) together with a function such that for any , , , we have
is called a “pairing” or a “bilinear pairing”. What this lets us do is “multiply in the scalar” but only once.
Fix a degree bound on the polynomials we would like to be able to commit to. The KZG scheme, will let us commit to polynomials in . As a preliminary, fix generators arbitrarily.
The first thing to know about the KZG scheme is it requires that we randomly sample some group elements to help us. This is the dreaded and much discussed trusted setup. So, anyway, we start by sampling at random from and computing for ,
And then throw away . The security depends on no one knowing , which is sometimes referred to as the toxic waste of the trusted setup. Basically we compute the generator scaled by powers of up to the degree bound. We make a security assumption about the groups which says that all anyone can really do with group elements is take linear combinations of them.
Now suppose we have a polynomial with that we would like to commit to. We will describe a version of the scheme that is binding but not hiding, so it may leak information about the polynomial. Now, to commit to , we compute
so that and
So is scaled by and the fact that is an -module (i.e. a vector space whose scalars come from ) means we can compute from the and the coefficients of without knowing .
Now how does opening work? Well, say we want to open at a point to . Then the polynomial vanishes at , which means that it is divisible by the polynomial (exercise, use polynomial division and analyze the remainder).
So, the opener can compute the polynomial
and commit to it as above to get a commitment . And will be the opening proof. It remains only to describe verification. It works like this
This amounts to checking: “is the polynomial committed to equal to the polynomial committed to by times ”?
To see why, remember that , and say and so we are checking
which by the bilinearity of the pairing is the same as checking
Bootleproof inner product argument
Polynomial commitments
A polynomial commitment is a scheme that allows you to commit to a polynomial (i.e. to its coefficients). Later, someone can ask you to evaluate the polynomial at some point and give them the result, which you can do as well as provide a proof of correct evaluation.
Schwartz-Zippel lemma
TODO: move this section where most relevant
Let be a non-zero polynomial of degree over a field of size , then the probability that for a randomly chosen is at most .
In a similar fashion, two distinct degree polynomials and can at most intersect in points. This means that the probability that on a random is . This is a direct corollary of the Schwartz-Zipple lemma, because it is equivalent to the probability that with .
Inner product argument
What is an inner product argument?
The inner product argument is the following construction: given the commitments (for now let’s say the hash) of two vectors and of size and with entries in some field , prove that their inner product is equal to .
There exist different variants of this inner product argument. In some versions, none of the values (, and ) are given, only commitments. In some other version, which is interesting to us and that I will explain here, only is unknown.
How is that useful?
Inner products arguments are useful for several things, but what we’re using them for in Mina is polynomial commitments. The rest of this post won’t make too much sense if you don’t know what a polynomial commitment is, but briefly: it allows you to commit to a polynomial and then later prove its evaluation at some point . Check my post on Kate polynomial commitments for more on polynomial commitment schemes.
How does that translate to the inner product argument though? First, let’s see our polynomial as a vector of coefficients:
Then notice that
And here’s our inner product again.
The idea behind Bootleproof-type of inner product argument
The inner product argument protocol I’m about to explain was invented by Bootle et al. It was later optimized in the Bulletproof paper (hence why we unofficially call the first paper bootleproof), and then some more in the Halo paper. It’s the later optimization that I’ll explain here.
A naive approach
So before I get into the weeds, what’s the high-level? Well first, what’s a naive way to prove that we know the pre-image of a hash , the vector , such that ? We could just reveal and let anyone verify that indeed, hashing it gives out , and that it also verifies the equation .
Obliviously, we have to reveal itself, which is not great. But we’ll deal with that later, trust me. What we want to tackle first here is the proof size, which is the size of the vector . Can we do better?
Reducing the problem to a smaller problem to prove
The inner product argument reduces the opening proof by using an intermediate reduction proof:
Where the size of is half the size of , and as such the final opening proof () is half the size of our naive approach.
The reduction proof is where most of the magic happens, and this reduction can be applied many times ( times to be exact) to get a final opening proof of size 1. Of course the entire proof is not just the final opening proof of size 1, but all the elements involved in the reduction proofs. It can still be much smaller than the original proof of size .
So most of the proof size comes from the multiple reduction subproofs that you’ll end up creating on the way. Our proof is really a collection of miniproofs or subproofs.
One last thing before we get started: Pedersen hashing and commitments
To understand the protocol, you need to understand commitments. I’ve used hashing so far, but hashing with a hash function like SHA-3 is not great as it has no convenient mathematical structure. We need algebraic commitments, which will allow us to prove things on the committed value without revealing the value committed. Usually what we want is some homomorphic property that will allow us to either add commitments together or/and multiply them together.
For now, let’s see a simple non-hiding commitment: a Pedersen hash. To commit to a single value simply compute:
where the discrete logarithm of is unknown. To open the commitment, simply reveal the value .
We can also perform multi-commitments with Pedersen hashing. For a vector of values , compute:
where each is distinct and has an unknown discrete logarithm as well. I’ll often shorten the last formula as the inner product for and . To reveal a commitment, simply reveal the values .
Pedersen hashing allow commitents that are non-hiding, but binding, as you can’t open them to a different value than the originally comitted one. And as you can see, adding the commitment of and gives us the commitment of :
which will be handy in our inner product argument protocol
The protocol
Set up
Here are the settings of our protocol. Known only to the prover, is the secret vector
The rest is known to both:
- , a basis for Pedersen hashing
- , the commitment of
- , the powers of some value such that
- the result of the inner product
For the sake of simplicity, let’s pretend that this is our problem, and we just want to halve the size of our secret vector before revealing it. As such, we will only perform a single round of reduction. But you can also think of this step as being already the reduction of another problem twice as large.
We can picture the protocol as follows:
- The prover first sends a commitment to the polynomial .
- The verifier sends a point , asking for the value . To help the prover perform a proof of correct evaluation, they also send a random challenge . (NOTE: The verifier sends the random challenge ONLY AFTER they receive the )
- The prover sends the result of the evaluation, , as well as a proof.

Does that make sense? Of course what’s interesting to us is the proof, and how the prover uses that random .
Reduced problem
First, the prover cuts everything in half. Then they use to construct linear combinations of these cuts:
This is how the problem is reduced to .
At this point, the prover can send , , and and the verifier can check if indeed . But that wouldn’t make much sense would it? Here we also want:
- a proof that proving that statement is the same as proving the previous statement ()
- a way for the verifier to compute and and (the new commitment) by themselves.
The actual proof
The verifier can compute as they have everything they need to do so.
What about , the commitment of which uses the new basis. It should be the following value:
So to compute this new commitment, the verifier needs:
- the previous commitment , which they already have
- some powers of , which they can compute
- two curve points and , which the prover will have to provide to them
What about ? Recall:
So the new inner product should be:
Similarly to , the verifier can recompute from the previous value and two scalar values and which the prover needs to provide.
So in the end, the proof has become:
- the vector which is half the size of
- the curve points (around two field elements, if compressed)
- the scalar values
We can update our previous diagram:

In our example, the naive proof was to reveal which was 4 field elements. We are now revealing instead 2 + 2 + 2 = 6 field elements. This is not great, but if was much larger (let’s say 128), the reduction in half would still be of 64 + 2 + 2 = 68 field elements. Not bad no? We can do better though…
The HALO optimization
The HALO optimization is similar to the bulletproof optimization, but it further reduces the size of our proof, so I’ll explain that directly.
With the HALO optimization, the prover translates the problem into the following:
This is simply a commitment of and .
A naive proof is to reveal and let the verifier check that it is a valid opening of the following commitment. Then, that commitment will be reduced recursively to commitments of the same form.
The whole point is that the reduction proofs will be smaller than our previous bootleproof-inspired protocol.
How does the reduction proof work? Notice that this is the new commitment:
This is simply from copy/pasting the equations from the previous section. This can be further reduced to:
And now you see that the verifier now only needs, in addition to , two curve points (~ 2 field elements):
this is in contrast to the 4 field elements per reduction needed without this optimization. Much better right?
At the end of a round (or the protocol) the verifier can compute the expected commitment as such:
and open it by computing the following and checking it is indeed equal to :
For this, the verifier needs to recompute and by themselves, which they can as they have all the necessary information. We can update our previous diagram:

What about zero-knowledge?
Didn’t we forget something? Oh right, we’re sending in clear, a single element that will leak some information about the original vector (as it is a linear combination of that original vector).
The simple solution is to alter our pedersen commitment to make it hiding on top of being binding:
where H is another generator we don’t know the discrete logarithm of, and is a random scalar value generated by the prover to blind the commitment.
But wait, each and also leaks some! As they are also made from the original secret vector . Remember, No worries, we can perform the same treatment on that curve point and blind it like so:
In order to open the final commitment, the verifier first recomputes the expected commitment as before:
then use and the final blinding value sent by the prover (and composed of and all the rounds’ and ), as well as reconstructed and to open the commitment:
with being equal to something like
At this point, the protocol requires the sender to send:
- 2 curve points and per rounds
- 1 scalar value for the final opening
- 1 blinding (scalar) value for the final opening
But wait… one last thing. In this protocol the prover is revealing , and even if they were not, by revealing they might allow someone to recompute … The HALO paper contains a generalized Schnorr protocol to open the commitment without revealing nor .

from Vanishree:
- So in general the more data we send the more randomness we need to ensure the private aspects are hidden, right
- The randomness in a commitment is because we are sending the commitment elements
- The random elements mixed with the polynomial (in the new zkpm technique) is because we send evaluations of the polynomial at zeta and zeta omega later
- Zk in Schnorr opening is because we reveal the opening values
where can I find a proof? perhaps appendix C of https://eprint.iacr.org/2017/1066.pdf
The real protocol, and a note on non-interactivity
Finally, we can look at what the real protocol end up looking at with rounds of reduction followed by a commitment opening.

So far the protocol was interactive, but you can make it non-interactive by simply using the Fiat-Shamir transformation. That’s all I’ll say about that.
Different functionalities
There’s a number of useful stuff that we can do on top of a bootleproof-style polynomial commitment. I’ll briefly list them here.
Enforcing an upperbound on the polynomial degree
Imagine that you want to enforce a maximum degree on a committed polynomial. You can do that by asking the prover to shift the coefficients of the polynomial to the right, so much that it becomes impossible to fit them if the polynomial were to be more than the maximum degree we wanted to enforce. This is equivalent to the following:
When the verifier verifies the opening, they will have to right shift the received evaluation in the same way.
Aggregating opening proofs for several polynomials
Insight:
Aggregating opening proofs for several evaluations
Insight:
Double aggregation
Insight:
Note that this kind of aggregation forces us to provide all combinations of evaluations, some of which might not be needed (for example, ).
Splitting a polynomial
If a polynomial is too large to fit in one SRS, you can split it in chuncks of size at most
Proof of correct commitment to a polynomial
That is useful in HALO. Problem statement: given a commitment , and a polynomial , prove to me that the . The proof is simple:
- generate a random point
- evaluate at ,
- ask for an evaluation proof of on . If it evaluates to as well then is a commitment to with overwhelming probability
Overview
Many modern proof systems (and I think all that are in use) are constructed according to the following recipe.
-
You start out with a class of computations.
-
You devise a way to arithmetize those computations. That is, to express your computation as a statement about polynomials.
More specifically, you describe what is often called an “algebraic interactive oracle proof” (AIOP) that encodes your computation. An AIOP is a protocol describing an interaction between a prover and a verifier, in which the prover sends the verifier some “polynomial oracles” (basically a black box function that given a point evaluates a polynomial at that point), the verifier sends the prover random challenges, and at the end, the verifier queries the prover’s polynomials at points of its choosing and makes a decision as to whether it has been satisfied by the proof.
-
An AIOP is an imagined interaction between parties. It is an abstract description of the protocol that will be “compiled” into a SNARK. There are several “non-realistic” aspects about it. One is that the prover sends the verifier black-box polynomials that the verifier can evaluate. These polynomials have degree comparable to the size of the computation being verified. If we implemented these “polynomial oracles” by having the prover really send the size polynomials (say by sending all their coefficients), then we would not have a zk-SNARK at all, since the verifier would have to read this linearly sized polynomial so we would lose succinctness, and the polynomials would not be black-box functions, so we may lose zero-knowledge.
Instead, when we concretely instantiate the AIOP, we have the prover send constant-sized, hiding polynomial commitments. Then, in the phase of the AIOP where the verifier queries the polynomials, the prover sends an opening proof for the polynomial commitments which the verifier can check, thus simulating the activity of evaluating the prover’s polynomials on your own.
So this is the next step of making a SNARK: instantiating the AIOP with a polynomial commitment scheme of one’s choosing. There are several choices here and these affect the properties of the SNARK you are constructing, as the SNARK will inherit efficiency and setup properties of the polynomial commitment scheme used.
-
An AIOP describes an interactive protocol between the verifier and the prover. In reality, typically, we also want our proofs to be non-interactive.
This is accomplished by what is called the Fiat–Shamir transformation. The basic idea is this: all that the verifier is doing is sampling random values to send to the prover. Instead, to generate a “random” value, the prover simulates the verifier by hashing its messages. The resulting hash is used as the “random” challenge.
At this point we have a fully non-interactive proof. Let’s review our steps.
-
Start with a computation.
-
Translate the computation into a statement about polynomials and design a corresponding AIOP.
-
Compile the AIOP into an interactive protocol by having the prover send hiding polynomial commitments instead of polynomial oracles.
-
Get rid of the verifier-interaction by replacing it with a hash function. I.e., apply the Fiat–Shamir transform.
-
zk-SNARKs
Finally we can discuss zk-SNARKs. In these notes right now I will not start with intuitive motivations for SNARKs because I can’t think of them right now, but maybe later.
Let be a program (in some yet to be specified programming language). For , zk-SNARKs let you produce proofs for statements of the form
I know such that
in such a way that no information about is revealed by the proof (this is the zero-knowledge or ZK part of them).
In other words, zk-SNARKs allow you to prove execution of programs, where you are also allowed to hide a selected portion (i.e., the portion) of the data involved in the computation.
We could also phrase zk-SNARKs in terms of computing functions , which seems more general, but actually this would be taken care of by the previous sense where we can only use boolean-valued functions, by proving against the function defined by .
SNARK stands for Succinct Non-interactive ARguments of Knowledge. If it also satisfies zero-knowledge, then it is called a zk-SNARK. Preprocessing SNARKs allow the verifier to precompute some encodings of the relation to run independently of the instance length. Definitions of these schemes owe the succinctness property to the fact that both the verifier and the proof length are (at most) polylogarithmic in the size of the computation (the witness length). Recent schemes such as Plonk, go beyond that definition and provide zkSNARKs with constant proof length. Unlike the verifier in preprocessing SNARKs, the prover can’t be faster than linear, as it has to read the circuit at least.
Expressing computations with “iterating over trace tables”
We will define a notion of computation (a programming language basically) for which it is easy to construct zk-SNARKs.
This section explains how to design and add a custom constraint to our proof-systems library.
PLONK is an AIOP. That is, it is a protocol in which the prover sends polynomials as messages and the verifier sends random challenges, and then evaluates the prover’s polynomials and performs some final checks on the outputs.
PLONK is very flexible. It can be customized with constraints specific to computations of interest. For example, in Mina, we use a PLONK configuration called kimchi that has custom constraints for poseidon hashing, doing elliptic curve operations, and more.
A “PLONK configuration” specifies
- The set of types of constraints that you would like to be able to enforce. We will describe below how these types of constraints are specified.
- A number of “eq-able” columns
W - A number of “advice” columns
A
Under such configuration, a circuit is specified by
- A number of rows
n - A vector
csof constraint-types of lengthn. I.e., a vector that specifies, for each row, which types of constraints should be enforced on that row. - A vector
eqs : Vec<(Position, Position)>of equalities to enforce, wherestruct Position { row: usize, column: usize }. E.g., if the pair(Position { row: 0, col: 8 }, Position { row: 10, col: 2 })is ineqs, then the circuit is saying the entries in those two positions should be equal, or in other words that they refer to the same value. This is where the distinction between “eq-able” and “advice” columns comes in. Thecolumnfield of a position in theeqsarray can only refer to one of the firstWcolumns. Equalities cannot be enforced on entries in theAcolumns after that.
Then, given such a circuit, PLONK lets you produce proofs for the statement
I know
W + A“column vectors” of field elementsvs: [Vec<F>; W + A]such that for each row indexi < n, the constraint of typecs[i]holds on the values[vs[0][i], ..., vs[W+A - 1][i], vs[0][i+1], ..., vs[W+A - 1][i+1]and all the equalities ineqshold. I.e., for(p1, p2)ineqswe havevs[p1.col][p1.row] == vs[p2.col][p2.row]. So, a constraint can check the values in two adjacent rows.
Specifying a constraint
Mathematically speaking, a constraint is a multivariate polynomial over the variables . In other words, there is one variable corresponding to the value of each column in the “current row” and one variable correspond to the value of each column in the “next row”.
In Rust, is written E::cell(Column::Witness(i), r). So, for example, the variable is written
E::cell(Column::Witness(3), CurrOrNext::Next).
let w = |i| v(Column::Witness(i));
Let’s
Defining a PLONK configuration
The art in proof systems comes from knowing how to design a PLONK configuration to ensure maximal efficiency for the sorts of computations you are trying to prove. That is, how to choose the numbers of columns W and A, and how to define the set of constraint types.
Let’s describe the trade-offs involved here.
The majority of the proving time for the PLONK prover is in
- committing to the
W + Acolumn polynomials, which have length equal to the number of rowsn - committing to the “permutation accumulator polynomial, which has length
n. - committing to the quotient polynomial, which reduces to computing
max(k, W)MSMs of sizen, wherekis the max degree of a constraint. - performing the commitment opening proof, which is mostly dependent on the number of rows
n.
So all in all, the proving time is approximately equal to the time to perform W + A + 1 + max(k - 1, W) MSMs of size n, plus the cost of an opening proof for polynomials of degree n - 1.
and maybe
-
computing the combined constraint polynomial, which has degree
k * nwherekis the maximum degree of a constraint -
Increasing
WandAincrease proof size, and they potentially impact the prover-time as the prover must compute polynomial commitments to each column, and computing a polynomial commitment corresponds to doing one MSM (multi-scalar multiplication, also called a multi-exponentiation.)However, often increasing the number of columns allows you to decrease the number of rows required for a given computation. For example, if you can perform one Poseidon hash in 36 rows with 5 total columns, then you can also perform it in 12 (= 36 / 3) rows with 15 (= 5 * 3) total columns.
Decreasing the number of rows (even while keeping the total number of table entries the same) is desirable because it reduces the cost of the polynomial commitment opening proof, which is dominated by a factor linear in the number of rows, and barely depends on the the number of columns.
Increasing the number of columns also increases verifier time, as the verifier must perform one scalar-multiplication and one hash per column. Proof length is also affected by a larger number of columns, as more polynomials need to be committed and sent along to the verifier.
There is typically some interplay between these
Overview
The proof system in Mina is a variant of PLONK. To understand PLONK, you can refer to our series of videos on the scheme. In this section we explain our variant, called kimchi.
kimchi is not formally a zk-SNARK, as it is not succinct in the proof size. zk-SNARKs must have a proof size where n is the number of gates in the circuit. In kimchi, due to the Bootleproof polynomial commitment approach, our proofs are where is the maximum degree of the committed polynomials (in the order of the number of rows). In practice, our proofs are in the order of dozens of kilobytes. Note that kimchi is a zk-SNARK in terms of verification complexity ()) and the proving complexity is quasilinear () – recall that the prover cannot be faster than linear.
Essentially what PLONK allows you to do is to, given a program with inputs and outputs, take a snapshot of its execution. Then, you can remove parts of the inputs, or outputs, or execution trace, and still prove to someone that the execution was performed correctly for the remaining inputs and outputs of the snapshot. There are a lot of steps involved in the transformation of “I know an input to this program such that when executed with these other inputs it gives this output” to “here’s a sequence of operations you can execute to convince yourself of that”.
At the end of this chapter, you will understand that the protocol boils down to filling a number of tables (illustrated in the diagram below): tables that specify the circuit, and tables that describes an execution trace of the circuit (given some secret and public inputs).

Glossary
- size = number of rows
- columns = number of variables per rows
- cell = a pair (row, column)
- witness = the values assigned in all the cells
- gate = polynomials that act on the variables in a row
- selector vector = a vector of values 1 or 0 (not true for constant vector I think) that toggles gates and variables in a row
- gadget = a series of contiguous rows with some specific gates set (via selector vectors)
Domain
Plonk needs a domain to encode the circuit on (for each point we can encode a row/constraint/gate). We choose a domain such that it is a multiplicative subgroup of the scalar field of our curve.
2-adicity
Furthermore, as FFT (used for interpolation, see the section on FFTs) works best in domains of size for some , we choose to be a subgroup of order .
Since the pasta curves both have scalar fields of prime orders and , their multiplicative subgroup is of order and respectively (without the zero element). The pasta curves were generated specifically so that both and are multiples of .
We say that they have 2-adicity of 32.
Looking at the definition of one of the pasta fields in Rust you can see that it is defined specifically for a trait related to FFTs:
#![allow(unused)] fn main() { impl FftParameters for FqParameters { type BigInt = BigInteger; const TWO_ADICITY: u32 = 32; #[rustfmt::skip] const TWO_ADIC_ROOT_OF_UNITY: BigInteger = BigInteger([ 0x218077428c9942de, 0xcc49578921b60494, 0xac2e5d27b2efbee2, 0xb79fa897f2db056 ]); }
The 2-adicity of 32 means that there’s a multiplicative subgroup of size that exists in the field. The code above also defines a generator for it, such that and for (so it’s a primitive -th root of unity).
[Lagrange’s theorem](https://en.wikipedia.org/wiki/Lagrange%27s_theorem_(group_theory)) tells us that if we have a group of order , then we’ll have subgroups with orders dividing . So in our case, we have subgroups with all the powers of 2, up to the 32-th power of 2.
To find any of these groups, it is pretty straight forward as well. Notice that:
- let , then and so generates a subgroup of order 31
- let , then and so generates a subgroup of order 30
- and so on…
In arkworks you can see how this is implemented:
#![allow(unused)] fn main() { let size = n.next_power_of_two() as u64; let log_size_of_group = ark_std::log2(usize::try_from(size).expect("too large")); omega = Self::TWO_ADIC_ROOT_OF_UNITY; for _ in log_size_of_group..Self::TWO_ADICITY { omega.square_in_place(); } }
this allows you to easily find subgroups of different sizes of powers of 2, which is useful in zero-knowledge proof systems as FFT optimizations apply well on domains that are powers of 2. You can read more about that in the mina book.
Efficient computation of the vanishing polynomial
For each curve, we generate a domain as the set .
As we work in a multiplicative subgroup, the polynomial that vanishes in all of these points can be written and efficiently calculated as .
This is because, by definition, . If , then for some , and we have:
This optimization is important, as without it calculating the vanishing polynomial would take a number of operations linear in (which is not doable in a verifier circuit, for recursion). With the optimization, it takes us a logarithmic number of operation (using exponentiation by squaring) to calculate the vanishing polynomial.
Lagrange basis in multiplicative subgroups
What’s a lagrange base?
if , otherwise.
What’s the formula?
Arkworks has the formula to construct a lagrange base:
Evaluate all Lagrange polynomials at to get the lagrange coefficients. Define the following as
- : The coset we are in, with generator and offset
- : The size of the coset
- : The vanishing polynomial for .
- : A sequence of values, where , and
We then compute as
However, if in , both the numerator and denominator equal 0 when i corresponds to the value tau equals, and the coefficient is 0 everywhere else. We handle this case separately, and we can easily detect by checking if the vanishing poly is 0.
following this, for we have:
- and so on
What’s the logic here?
https://en.wikipedia.org/wiki/Lagrange_polynomial#Barycentric_form
non-interaction with fiat-shamir
So far we’ve talked about an interactive protocol between a prover and a verifier. The zero-knowledge proof was also in the honest verifier zero-knowedlge (HVZK) model, which is problematic.
In practice, we want to remove the interaction and have the prover produce a proof by themselves, that anyone can verify.
Public-coin protocols
public-coin protocols are protocols were the messages of the verifier are simply random messages. This is important as our technique to transform an interactive protocol to a non-interactive protocol works on public-coin protocols.
Fiat-Shamir trick
The whole idea is to replace the verifier by a random oracle, which in practice is a hash function. Note that by doing this, we remove potential leaks that can happen when the verifier acts dishonestly.
Initially the Fiat-Shamir transformation was only applied to sigma protocols, named after the greek letter due to its shape resembling the direction of messages (prover sends a commit to a verifier, verifier sends a challenge to a prover, prover sends the final proof to a verifier). A would have made more sense but here we are.
Generalization of Fiat-Shamir
As our protocol has more than three moves, where every even move is a challenge from the verifier, we need to generalize Fiat-Shamir. This is simple: every verifier move can be replaced by a hash of the transcript (every message sent and received so far) to obtain a challenge.
In practice: a duplex construction as Merlin
While we use a hash function for that, a different construction called the duplex construction is particularly useful in such situations as they allow to continuously absorb the transcript and produce challenges, while automatically authenticating the fact that they produced a challenge.
Merlin is a standardization of such a construction using the Strobe protocol framework (a framework to make use of a duplex construction). Note that the more recent Xoodyak (part of NIST’s lightweight competition) could have been used for this as well. Note also that Mina uses none of these standards, instead it simply uses Poseidon (see section on poseidon).
Plookup
Plookup allows us to check if witness values belong to a look up table. This is usualy useful for reducing the number of constraints needed for bit-wise operations. So in the rest of this document we’ll use the XOR table as an example.

First, let’s define some terms:
- lookup table: a table of values that means something, like the XOR table above
- joint lookup table: a table where cols have been joined together in a single col (with a challenge)
- table entry: a cell in a joint lookup table
- single lookup value: a value we’re trying to look up in a table
- joint lookup values: several values that have been joined together (with a challenge)
A joint lookup table looks like this, for some challenge :

Constraints
Computes the aggregation polynomial for maximum lookups per row, whose -th entry is the product of terms:
for .
- is the -th entry in the table
- is the -th lookup in the -th row of the witness
For every instance of a value in and , there is an instance of the same value in
is sorted in the same order as , increasing along the ‘snake-shape’
Whenever the same value is in and , that term in the denominator product becomes
this will cancel with the corresponding looked-up value in the witness
Whenever the values and differ, that term in the denominator product will cancel with some matching
because the sorting is the same in and .
There will be exactly the same number of these as the number of values in if only contains values from . After multiplying all of the values, all of the terms will have cancelled if is a sorting of and , and the final term will be because of the random choice of and , there is negligible probability that the terms will cancel if is not a sorting of and
But because of the snakify:
- we are repeating an element between cols, we need to check that it’s the same in a constraint
- we invert the direction, but that shouldn’t matter
FAQ
- how do we do multiple lookups per row?
- how do we dismiss rows where there are no lookup?
Maller optimization to reduce proof size
In the PLONK paper, they make use of an optimization from Mary Maller in order to reduce the proof size.
Explanation
Maller’s optimization is used in the “polynomial dance” between the prover and the verifier to reduce the number of openings the prover send.
Recall that the polynomial dance is the process where the verifier and the prover form polynomials together so that:
- the prover doesn’t leak anything important to the verifier
- the verifier doesn’t give the prover too much freedom
In the dance, the prover can additionally perform some steps that will keep the same properties but with reduced communication.
Let’s see the protocol where Prover wants to prove to Verifier that
given commitments of ,

A shorter proof exists. Essentially, if the verifier already has the opening h1(s), they can reduce the problem to showing that
given commitments of and evaluation of at a point .

Notes
Why couldn’t the prover open the polynomial directly?
By doing

The problem here is that you can’t multiply the commitments together without using a pairing (if you’re using a pairing-based polynomial commitment scheme), and you can only use that pairing once in the protocol.
If you’re using an inner-product-based commitment, you can’t even multiply commitments.
question: where does the multiplication of commitment occurs in the pairing-based protocol of PLONK? And how come we can use bootleproof if we need that multiplication of commitment?
Appendix: Original explanation from the PLONK paper
https://eprint.iacr.org/2019/953.pdf

For completion, the lemma 4.7:

Overview
Here we explain how the Kimchi protocol design is translated into the proof-systems repository, from a high level perspective, touching briefly on all the involved aspects of cryptography. The concepts that we will be introducing can be studied more thoroughly by accessing the specific sections in the book.
In brief, the Kimchi protocol requires three different types of arguments Argument:
- Custom gates: they correspond to each of the specific functions performed by the circuit, which are represented by gate constraints.
- Permutation: the equality between different cells is constrained by copy constraints, which are represented by a permutation argument. It represents the wiring between gates, the connections from/to inputs and outputs.
- Lookup tables: for efficiency reasons, some public information can be stored by both parties (prover and verifier) instead of wired in the circuit. Examples of these are boolean functions.
All of these arguments are translated into equations that must hold for a correct witness for the full relation. Equivalently, this is to say that a number of expressions need to evaluate to zero on a certain set of numbers. So there are two problems to tackle here:
- Roots-check: Check that an equation evaluates to zero on a set of numbers.
- Aggregation: Check that it holds for each of the equations.
Roots-check
For the first problem, given a polynomial of degree , we are asking to check that for all , where stands for set. Of course, one could manually evaluate each of the elements in the set and make sure of the above claim. But that would take so long (i.e. it wouldn’t be succinct). Instead, we want to check that all at once. And a great way to do it is by using a vanishing polynomial. Such a polynomial will be nothing else than the smallest polynomial that vanishes on . That means, it is exactly defined as the degree polynomial formed by the product of the monomials:
And why is this so advantageous? Well, first we need to make a key observation. Since the vanishing polynomial equals zero on every , and it is the smallest such polynomial (recall it has the smallest possible degree so that this property holds), if our initial polynomial evaluates to zero on , then it must be the case that is a multiple of the vanishing polynomial . But what does this mean in practice? By polynomial division, it simply means there exists a quotient polynomial of degree such that:
And still, where’s the hype? If you can provide such a quotient polynomial, one could easily check that if for a random number \ (recall you will check in a point out of the set, otherwise you would get a ), then with very high probability that would mean that actually , meaning that vanishes on the whole set , with just one point!
Let’s take a deeper look into the “magic” going on here. First, what do we mean by high probability? Is this even good enough? And the answer to this question is: as good as you want it to be.
First we analyse the math in this check. If the polynomial form of actually holds, then of course for any possible \ the check will hold. But is there any unlucky instantiation of the point such that but ? And the answer is, yes, there are, BUT not many. But how many? How unlikely this is? You already know the answer to this: Schwartz-Zippel. Recalling this lemma:
Given two different polynomials and of degree , they can at most intersect (i.e. coincide) in points. Or what’s equivalent, let , the polynomial can only evaluate to in at most points (its roots).
Thus, if we interchange and , both of degree , there are at most unlucky points of that could trick you into thinking that was a multiple of the vanishing polynomial (and thus being equal to zero on all of ). So, how can you make this error probability negligible? By having a field size that is big enough (the formal definition says that the inverse of its size should decrease faster than any polynomial expression). Since we are working with fields of size , we are safe on this side!
Second, is this really faster than checking that for all ? At the end of the day, it seems like we need to evaluate , and since this is a degree polynomial it looks like we are still performing about the same order of computations. But here comes math again. In practice, we want to define this set to have a nice structure that allows us to perform some computations more efficiently than with arbitrary sets of numbers. Indeed, this set will normally be a multiplicative group (normally represented as or ), because in such groups the vanishing polynomial has an efficient representation , which is much faster to evaluate than the above product.
Third, we may want to understand what happens with the evaluation of instead. Since this is a degree , it may look like this will as well take a lot of effort. But here’s where cryptography comes into play, since the verifier will never get to evaluate the actual polynomial by themselves. Various reasons why. One, if the verifier had access to the full polynomial , then the prover should have sent it along with the proof, which would require coefficients to be represented (and this is no longer succinct for a SNARK). Two, this polynomial could carry some secret information, and if the verifier could recompute evaluations of it, they could learn some private data by evaluating on specific points. So instead, these evaluations will be a “mental game” thanks to polynomial commitments and proofs of evaluation sent by the prover (for whom a computation in the order of is not only acceptable, but necessary). The actual proof length will depend heavily on the type of polynomial commitments we are using. For example, in Kate-like commitments, committing to a polynomial takes a constant number of group elements (normally one), whereas in Bootleproof it is logarithmic. But in any case this will be shorter than sending elements.
Aggregation
So far we have seen how to check that a polynomial equals zero on all of , with just a single point. This is somehow an aggregation per se. But we are left to analyse how we can prove such a thing, for many polynomials. Altogether, if they hold, this will mean that the polynomials encode a correct witness and the relation would be satisfied. These checks can be performed one by one (checking that each of the quotients are indeed correct), or using an efficient aggregation mechanism and checking only one longer equation at once.
So what is the simplest way one could think of to perform this one-time check? Perhaps one could come up with the idea of adding up all of the equations into a longer one . But by doing this, we may be cancelling out terms and we could get an incorrect statemement.
So instead, we can multiply each term in the sum by a random number. The reason why this trick works is the independence between random numbers. That is, if two different polynomials and are both equal to zero on a given , then with very high probability the same will be a root of the random combination . If applied to the whole statement, we could transform the equations into a single equation,
\bigwedge_{i_n} p_i(X) =? 0 \iff{w.h.p.} \sum_{i=0}^{n} \rho_i \cdot p_i(X) =_? 0
This sounds great so far. But we are forgetting about an important part of proof systems which is proof length. For the above claim to be sound, the random values used for aggregation should be verifier-chosen, or at least prover-independent. So if the verifier had to communicate with the prover to inform about the random values being used, we would get an overhead of field elements.
Instead, we take advantage of another technique that is called powers-of-alpha. Here, we make the assumption that powers of a random value are indistinguishable from actual random values . Then, we can twist the above claim to use only one random element to be agreed with the prover as:
\bigwedge_{i_n} p_i(X) =? 0 \iff{w.h.p.} \sum_{i=0}^{n} \alpha^i \cdot p_i(X) =_? 0
Arguments
In the previous section we saw how we can prove that certain equations hold for a given set of numbers very efficiently. What’s left to understand is the motivation behind these techniques. Why is it so important that we can perform these operations and what do these equations represent in the real world?
But first, let’s recall the table that summarizes some important notation that will be used extensively:

One of the first ideas that we must grasp is the notion of a circuit. A circuit can be thought of as a set of gates with wires connections between them. The simplest type of circuit that one could think of is a boolean circuit. Boolean circuits only have binary values: and , true and false. From a very high level, boolean circuits are like an intricate network of pipes, and the values could be seen as water or no water. Then, gates will be like stopcocks, making water flow or not between the pipes. This video is a cool representation of this idea. Then, each of these behaviours will represent a gate (i.e. logic gates). One can have circuits that can perform more operations, for instance arithmetic circuits. Here, the type of gates available will be arithmetic operations (additions and multiplications) and wires could have numeric values and we could perform any arbitrary computation.
But if we loop the loop a bit more, we could come up with a single Generic gate that could represent any arithmetic operation at once. This thoughtful type of gate is the one gate whose concatenation is used in Plonk to describe any relation. Apart from it’s wires, these gates are tied to an array of coefficients that help describe the functionality. But the problem of this gate is that it takes a large set of them to represent any meaningful function. So instead, recent Plonk-like proof systems have appeared which use custom gates to represent repeatedly used functionalities more efficiently than as a series of generic gates connected to each other. Kimchi is one of these protocols. Currently, we give support to specific gates for the Poseidon hash function, the CompleteAdd operation for curve points, VarBaseMul for variable base multiplication, EndoMulScalar for endomorphism variable base scalar multiplication, and the ChaCha encryption primitive. Nonetheless, we have plans to further support Cairo instructions and ForeignMul and foreign field multiplication.
The circuit structure is known ahead of time, as this is part of the public information that is shared with the verifier. What is secret is what we call the witness of the relation, which is the correct instantiation of the wires of the circuit satisfying all of the checks. This means the verifier knows what type of gate appears in each part of the circuit, and the coefficients that are attached to each of them.
The execution trace refers to the state of all the wires throughout the circuit, upon instantiation of the witness. It will be represented as a table where the rows correspond to each gate and the columns refer to the actual wires of the gate (a.k.a. input and output registers) and some auxiliary values needed for the computation (a.k.a. advice registers). The current implementation of Kimchi considers a total of 15 columns with the first 7 columns corresponding to I/O registers. Additionally, gates are allowed to access the elements in the current row Curr and the next Next. The permutation argument acts on the I/O registers (meaning, it will check that the cells in the first 7 columns of the execution trace will be copied correctly to their destination cells). For this reason, these checks are also known as copy constraints.
Going back to the main motivation of the scheme, recall that we wanted to check that certain equations hold in a given set of numbers. Here’s where this claim starts to make sense. The total number of rows in the execution trace will give us a domain. That is, we define a mapping between each of the row indices of the execution trace and the elements of a multiplicative group with as many elements as rows in the table. Two things to note here. First, if no such group exists we can pad with zeroes. Second, any multiplicative group has a generator whose powers generate the whole group. Then we can assign to each row a power of . Why do we want to do this? Because this will be the set over which we want to check our equations: we can transform a claim about the elements of a group to a claim like “these properties hold for each of the rows of this table”. Interesting, right?
Custom gates
The goal of this section is to draw the connections between the practical aspect of designing custom gates and the theory behind it.
First, suppose we had the following layout of gates. This means that our circuit has four gates: a (single) generic gate, a (non-optimized) complete add, and then two different Example gates that we just make up for the example; ExamplePairs and ExampleTriples.
| row | GateType:: | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Generic | l | r | o | ||||||||||||
| 1 | CompleteAdd | x1 | y1 | x2 | y2 | x3 | y3 | inf | same_x | s | inf_z | x21_inv | ||||
| 2 | ExamplePairs | a | e | i | o | u | y | |||||||||
| 3 | ExampleTriples | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
The most important aspect to take into account when you are designing a new custom gate is writing the actual description of the new functionality. For this, we are using the Expression framework that allows us to write equations generically by representing each column of the table (wire) as a variable. Then we can write operations between them and define the gate constraints. From a mathematical perspective, these will have the shape of multivariate polynomials of as many variables as columns in the table (currently 15 for Kimchi) that is constrained to equal 0.
In our example, we would define equations that represent the operations carried out by each of the gates. In the following equations we use the variable names that we wrote in the cells above, and we don’t imply that the same name in a different row correspond to the same value (we are not imposing equalities between gates yet):
Generic:
lrolr
CompleteAdd:
x21_invx2x1same_xsame_xx2x1same_xsy1x1same_xx2x1sy2+y1x1x2x3ssx1x3y1y3-
y2y1same_xinf -
y2y1inf_zinf
ExamplePairs:
aeiouy
ExampleTriples:
abcdefghijklmno
Nonetheless, when we write these constraints in proof-systems, we use a shared framework that helps us use the columns in each row as placeholders. For example, the i-th column of the current row Curr is represented as:
#![allow(unused)] fn main() { E::<F>::cell(Column::Witness(i), CurrOrNext::Curr) }
where the E type is a shorthand for Expr<ConstantExpr>. More succinctly, we refer to the above simply as:
#![allow(unused)] fn main() { witness_curr(i) }
For readability, let’s just refer to them here as w0..w14. Then, the above constraints would look like the following:
Generic:
w0w1w2w0w1
CompleteAdd:
w10w2w0w7w7w2w0w7w8w1w0w7w2w0w8w3+w1w0w2w4w8w8w0w4w1w4-
w3w1w7w6 -
w3w1w9w6
ExamplePairs:
w0w1w2w3w4w5
ExampleTriples:
w0w1w2w3w4w5w6w7w8w9w10w11w12w13w14
Looking at the list of equations, each of them look like polynomials in the variables w0..w14, constrained to zero. But so far, the type of equations we had studied were univariate (i.e. rather than ). But this will not be a problem, since these multivariate equations will soon become univariate with a simple trick: interpolation. Right now, we are describing the gate constraints row-wise (i.e. horizontally). But as we said before, our domain will be determined by the number of rows; we will transform these equations to a vertical and univariate behaviour. The strategy to follow will be obtaining witness polynomials that evaluate to the cells of the execution trace.
First, note that so far we didn’t mention any values for the w0..w14 terms, but instead they are placeholders for what is yet to come. And what is this you may ask: the witness of the relation. Put it another way, the witness will be the instantiation of the cells of the table that will make all of the above constraints hold. And where do we get these values from? Well, the prover will know or will compute them.
Second, we count the number of rows that we ended up with: here it’s four (conveniently, a power of two). What we want to do is to create as many witness polynomials as the number of columns in the execution trace. These polynomials, that we can call , must be design to evaluate to the cells of the trace. Formally we define each as:
Meaning that returns the -th column of the -th row of the execution trace. And how can we create such polynomials? We will use interpolation, which can be efficiently computed over power-of-two-sized groups, as it is the case in the example. Recalling Lagrangian terms, they act like linearly independent selectors over a set, in particular:
Let be a multiplicative group formed by the powers of a generator . Given any element in the group, the -th Lagrangian term evaluates to iff and iff .
Given that our group has size , the number of rows (ignoring ZK for now), will be the set . This means that we can build such witness polynomials as:
Then, the above constraints will become the following equations using our univariate witness polynomials:
Generic:
CompleteAdd:
ExamplePairs:
ExampleTriples:
These are the constraints that will need to be satisfied by a correct witness for the relation. We already know how to check many equations at once (combining them with powers of alpha), and how to check that the above holds for the full domain (by providing the quotient polynomial of the huge equation after dividing by the vanishing polynomial on ). All there’s left to do is making a correspondence between the constraints that need to be satisfied, and the particular row in the execution trace to which they apply. This implies, that before even aggregating the constraints into a single one, we need to impose this restriction.
In proof-systems, we represent this restriction with the GateType selector. In particular, we would multiply each of the constraints above by the Expr terms:
Index(GateType::YourType), which will be transformed (again) into mutually exclusive selector polynomials. This means, that each row can only have one of these polynomials being nonzero (equal to ), whereas the evaluation of all of the other selector polynomials for the given row should be . These polynomials are known by the verifier, who could check that this is indeed the case.
But what will these polynomials look like? I am sure the above paragraph is familiar for you already. Once again, we will be able to build them using interpolation with our Lagrangian terms for . In our example: will equal when and otherwise; will equal when and otherwise; will equal when and otherwise; and will equal when and otherwise. These polynomials are, in fact, a description of the circuit.
Note that we managed to make constraints for different gates already independent from each other (thanks to the mutually exclusivity). Nonetheless, granting independence within the gate is still needed. Here’s where we will be using the powers of alpha: within a gate the powers cannot be repeated, but they can be reused among gates. Then, we will need different powers of alpha for this stage of the protocol, as this is the largest number of different constraints within the same gate.
Putting all of the above together, this means that our LONG and single constraint will look like:
Finally, providing and performing the check on a random \ would give the verifier the assurance of the following:
The prover knows a polynomial that equals zero on any .
Nonetheless, it would still remain to verify that actually corresponds to the encoding of the actual constraints. Meaning, checking that this polynomial encodes the column witnesses, and the agreed circuit. So instead of providing just (actually, a commmitment to it), the prover can send commitments to each of th witness polynomials, so that the verifier can reconstruct the huge constraint using the encodings of the circuit (which is known ahead).
Permutation
TO-DO
Lookup
TO-DO
Snarky
Snarky is a frontend to the kimchi proof system.
It allows users to write circuits that can be proven using kimchi.
This part of the Mina book documents both how to use snarky, and how its internals work.
Snarky was originally an OCaml library. It also is known as a typescript library: SnarkyJS. This documentation talks about the Rust implementation, which one can refer to as snarky-rs (but we will just say snarky from now on).
High-level design
Snarky is divided into two parts:
- Circuit-generation: which is also called the setup or compilation phase. It is when snarky turn code written using its library, to a circuit that kimchi can understand. This can later be used by kimchi to produce prover and verifier keys.
- Witness-generation: which is also called the proving, or runtime phase. It is when snarky executes the written program and records its state at various point in time to create an execution trace of the program (which we call witness here). This can later be used by kimchi, with a proving key, to produce a zero-knowledge proof.
A snarky program is constructed using functions exposed by the library. The API of snarky that one can use to design circuits can be split in three categories:
- creation of snarky variables (via
compute()) - creation of constraints (via
asserttype-functions) - manipulation of snarky variables (which can sometimes create constraints)
Snarky itself is divided into three parts:
- The high-level API that you can find in
api.rsandtraits.rs - The snarky wrapper, which contains the logic for creating user variables and composed types (see the section on Snarky vars).
- The kimchi backend, which contains the logic for constructing the circuit as well as the witness.
API of Snarky
Snarky wrapper
Snarky, as of today, is constructed as two parts:
- a snarky wrapper, which is explained in this document
- a backend underneath that wrapper, explained in the kimchi backend section
This separation exists for legacy reasons, and ideally we should merge the two into a single library.
The snarky wrapper mostly exists in checked_runner.rs, and has the following state:
#![allow(unused)] fn main() { where F: PrimeField, { /// The constraint system used to build the circuit. /// If not set, the constraint system is not built. system: Option<SnarkyConstraintSystem<F>>, /// The public input of the circuit used in witness generation. // TODO: can we merge public_input and private_input? public_input: Vec<F>, // TODO: we could also just store `usize` here pub(crate) public_output: Vec<CVar<F>>, /// The private input of the circuit used in witness generation. Still not sure what that is, or why we care about this. private_input: Vec<F>, /// If set, the witness generation will check if the constraints are satisfied. /// This is useful to simulate running the circuit and return an error if an assertion fails. eval_constraints: bool, /// The number of public inputs. num_public_inputs: usize, /// A counter used to track variables (this includes public inputs) as they're being created. next_var: usize, /// Indication that we're running the witness generation (as opposed to the circuit creation). mode: Mode, } }
The wrapper is designed to be used in different ways, depending on the fields set.
Ideally, we would like to only run this once and obtain a result that’s an immutable compiled artifact.
Currently, public_input, private_input, eval_constriants, next_var, and mode all need to be mutable.
In the future these should be passed as arguments to functions, and should not exist in the state.
Public output
The support for public output is implemented as kind of a hack.
When the developer writes a circuit, they have to specify the type of the public output.
This allows the API to save enough room at the end of the public input, and store the variables used in the public output in the state.
When the API calls the circuit written by the developer, it expects the public output (as a snarky type) to be returned by the function. The compilation or proving API that ends up calling that function, can thus obtain the variables of the public output. With that in hand, the API can continue to write the circuit to enforce an equality constraint between these variables being returned and the public output variable that it had previously stored in the state.
Essentially, the kimchi backend will turn this into as many wiring as there are CVar in the public output.
During witness generation, we need a way to modify the witness once we know the values of the public output.
As the public output CVars were generated from the snarky wrapper (and not from the kimchi backend), the snarky wrapper should know their values after running the given circuit.
Kimchi Backend

Underneath the snarky wrapper (in snarky/checked_runner.rs) lies what we used to call the plonk_constraint_system or kimchi_backend in snarky/constraint_systen.rs.
It is good to note that we’re planning on removing this abstract separation between the snarky wrapper and the constraint system.
The logic in the kimchi backend serves two purposes:
- Circuit generation. It is the logic that adds gates to our list of gates (representing the circuit). For most of these gates, the variables used are passed to the backend by the snarky wrapper, but some of them are created by the backend itself (see more in the variables section).
- Witness generation. It is the logic that creates the witness
One can also perform two additional operations once the constraint system has been compiled:
- Generate the prover and verifier index for the system.
- Get a hash of the constraint system (this includes the circuit, the number of public input) (TODO: verify that this is true) (TODO: what else should be in that hash? a version of snarky and a version of kimchi?).
A circuit
A circuit is either being built, or has been contructed during a circuit generation phase:
#![allow(unused)] fn main() { enum Circuit<F> where F: PrimeField, { /** A circuit still being written. */ Unfinalized(Vec<GateSpec<(), F>>), /** Once finalized, a circuit is represented as a digest and a list of gates that corresponds to the circuit. */ Compiled([u8; 32], Vec<CircuitGate<F>>), } }
State
The state of the kimchi backend looks like this:
#![allow(unused)] fn main() { where Field: PrimeField, { /// A counter used to track variables /// (similar to the one in the snarky wrapper) next_internal_var: usize, /// Instruction on how to compute each internal variable /// (as a linear combination of other variables). /// Used during witness generation. internal_vars: HashMap<InternalVar, (Vec<(Field, V)>, Option<Field>)>, /// The symbolic execution trace table. /// Each cell is a variable that takes a value during witness generation. /// (if not set, it will take the value 0). rows: Vec<Vec<Option<V>>>, /// The circuit once compiled gates: Circuit<Field>, /// The row to use the next time we add a constraint. // TODO: I think we can delete this next_row: usize, /// The size of the public input /// (which fills the first rows of our constraint system. public_input_size: Option<usize>, // omitted values... } }
Variables
In the backend, there’s two types of variables:
#![allow(unused)] fn main() { enum V { /// An external variable /// (generated by snarky, via [exists]). External(usize), /// An internal variable is generated to hold an intermediate value, /// (e.g. in reducing linear combinations to single PLONK positions). Internal(InternalVar), } }
Internal variables are basically a usize pointing to a hashmap in the state.
That hashmap tells you how to compute the internal variable during witness generation: it is always a linear combination of other variables (and a constant).
Circuit generation
During circuit generation, the snarky wrapper will make calls to the add_constraint() or add_basic_snarky_constraint function of the kimchi backend, specifying what gate to use and what variables to use in that gate.
At this point, the snarky wrapper might have some variables that are not yet tracked as such (with a counter). Rather, they are constants, or they are a combination of other variables. You can see that as a small AST representing how to compute a variable. (See the variables section for more details).
For this reason, they can hide a number of operations that haven’t been constrained yet.
It is the role of the add_constrain logic to enforce that at this point constants, as well as linear combinations or scalings of variables, are encoded in the circuit.
This is done by adding enough generic gates (using the reduce_lincom() or reduce_to_var() functions).
This is a remnant of an optimization targetting R1CS (in which additions are for free).
An issue with this approach is the following: imagine that two circuit variables are created from the same circuit variable, imagine also that the original circuit variable contained a long AST, then both variables might end up creating the same constraints to convert that AST.
Currently, snarkyjs and pickles expose a seal() function that allows you to reduce this issue, at the cost of some manual work and mental tracking on the developer.
We should probably get rid of this, while making sure that we can continue to optimize generic gates
(in some cases you can merge two generic gates in one (TODO: give an example of where that can happen)).
Another solution is to keep track of what was reduced, and reuse previous reductions (similar to how we handle constants).
It is during this “reducing” step that internal variables (known only to the kimchi backend) are created.
The process is quite safe, as the kimchi backend cannot use the snarky wrapper variables directly (which are of type CVar).
Since the expected format (see the variables section is a number (of type usize), the only way to convert a non-tracked variable (constant, or scale, or linear combination) is to reduce it (and in the process constraining its value).
Depending on the gate being used, several constraints might be added via the add_row() function which does three things:
- figure out if there’s any wiring to be done
- add a gate to our list of gates (representing the circuit)
- add the variables to our symbolic execution trace table (symbolic in the sense that nothing has values yet)
This process happens as the circuit is “parsed” and the constraint functions of the kimchi backend are called.
This does not lead to a finalized circuit, see the next section to see how that is done.
(TODO: ideally this should happen in the same step)
Finalization of the circuit.
So far we’ve only talked about adding specific constraints to the circuit, but not about how public input are handled.
The finalization() function of the kimchi backend does the following:
- add as many generic rows as there are public inputs.
- construct the permutation
- computes a cache of the circuit (TODO: this is so unecessary)
- and other things that are not that important
Witness generation
Witness generation happens by taking the finalized state (in the compute_witness() function) with a callback that can be used to retrieve the values of external variables (public input and public output).
The algorithm follows these steps using the symbolic execution table we built during circuit generation:
- it initializes the execution trace table with zeros
- go through the rows related to the public input and set the most-left column values to the ones obtained by the callback.
- go through the other rows and compute the value of the variables left in the table
Variables in step 3. should either:
- be absent (
None) and evaluated to the default value 0 - point to an external variable, in which case the closure passed can be used to retrieve the value
- be an internal variable, in which case the value is computed by evaluating the AST that was used to create it.
Permutation
The permutation is used to wire cells of the execution trace table (specifically, cells belonging to the first 7 columns). It is also known as “copy constraints”.
In snarky, the permutation is represented differently from kimchi, and thus needs to be converted to the kimchi’s format before a proof can be created. TODO: merge the representations
We use the permutation in ingenious ways to optimize circuits. For example, we use it to encode each constants once, and wire it to places where it is used. Another example, is that we use it to assert equality between two cells.
Implementation details
There’s two aspect of the implementation of the permutation, the first one is a hashmap of equivalence classes, which is used to track all the positions of a variable, the second one is making use of a union find data structure to link variables that are equivalent (we’ll talk about that after).
The two data structures are in the kimchi backend’s state:
#![allow(unused)] fn main() { pub struct SnarkyConstraintSystem<Field> where Field: PrimeField, { equivalence_classes: HashMap<V, Vec<Position<Row>>>, union_finds: disjoint_set::DisjointSet<V>, // omitted fields... } }
equivalence classes
As said previously, during circuit generation a symbolic execution trace table is created. It should look a bit like this (if there were only 3 columns and 4 rows):
| 0 | 1 | 2 | |
|---|---|---|---|
| 0 | v1 | v1 | |
| 1 | v2 | ||
| 2 | v2 | ||
| 3 | v1 |
From that, it should be clear that all the cells containing the variable v1 should be connected,
and all the cells containing the variable v2 should be as well.
The format that the permutation expects is a cycle: a list of cells where each cell is linked to the next, the last one wrapping around and linking to the first one.
For example, a cycle for the v1 variable could be:
(0, 0) -> (0, 1)
(0, 1) -> (3, 2)
(3, 2) -> (0, 0)
During circuit generation, a hashmap (called equivalence_classes) is used to track all the positions (row and column) of each variable.
During finalization, all the different cycles are created by looking at all the variables existing in the hashmap.
Union finds
Sometimes, we know that two variables will have equivalent values due to an assert_equal() being called to link them.
Since we link two variables together, they need to be part of the same cycle, and as such we need to be able to detect that to construct correct cycles.
To do this, we use a union find data structure, which allows us to easily find the unions of equivalent variables.
When an assert_equal() is called, we link the two variables together using the union_finds data structure.
During finalization, when we create the cycles, we use the union_finds data structure to find the equivalent variables.
We then create a new equivalence classes hashmap to merge the keys (variables) that are in the same set.
This is done before using the equivalence classes hashmap to construct the cycles.
Vars
In this section we will introduce two types of variables:
- Circuit vars, or
CVars, which are low-level variables representing field elements. - Snarky vars, which are high-level variables that user can use to create more meaningful programs.
Circuit vars
In snarky, we first define circuit variables (TODO: rename Field variable?) which represent field elements in a circuit. These circuit variables, or cvars, can be represented differently in the system:
#![allow(unused)] fn main() { pub enum CVar<F> where F: PrimeField, { /// A constant. Constant(F), /// A variable that can be refered to via a `usize`. Var(usize), /// The addition of two other [CVar]s. Add(Box<CVar<F>>, Box<CVar<F>>), /// Scaling of a [CVar]. Scale(F, Box<CVar<F>>), } }
One can see a CVar as an AST, where two atoms exist: a Var(usize) which represents a private input, an a Constant(F) which represents a constant.
Anything else represents combinations of these two atoms.
Constants
Note that a circuit variable does not represent a value that has been constrained in the circuit (yet). This is why we need to know if a cvar is a constant, so that we can avoid constraining it too early. For example, the following code does not encode 2 or 1 in the circuit, but will encode 3:
#![allow(unused)] fn main() { let x: CVar = state.exists(|_| 2) + state.exists(|_| 3); state.assert_eq(x, y); // 3 and y will be encoded in the circuit }
whereas the following code will encode all variables:
#![allow(unused)] fn main() { let x = y + y; let one: CVar = state.exists(|_| 1); assert_eq(x, one); }
Non-constants
Right after being created, a CVar is not constrained yet, and needs to be constrained by the application.
That is unless the application wants the CVar to be a constant that will not need to be constrained (see previous example) or because the application wants the CVar to be a random value (unlikely) (TODO: we should add a “rand” function for that).
In any case, a circuit variable which is not a constant has a value that is not known yet at circuit-generation time. In some situations, we might not want to constrain the
When do variables get constrained?
In general, a circuit variable only gets constrained by an assertion call like assert or assert_equals.
When variables are added together, or scaled, they do not directly get constrained. This is due to optimizations targetting R1CS (which we don’t support anymore) that were implemented in the original snarky library, and that we have kept in snarky-rs.
Imagine the following example:
#![allow(unused)] fn main() { let y = x1 + x2 + x3 +.... ; let z = y + 3; assert_eq(y, 6); assert_eq(z, 7); }
The first two lines will not create constraints, but simply create minimal ASTs that track all of the additions.
Both assert calls will then reduce the variables to a single circuit variable, creating the same constraints twice.
For this reason, there’s a function seal() defined in pickles and snarkyjs. (TODO: more about seal(), and why is it not in snarky?) (TODO: remove the R1CS optimization)
Snarky vars
Handling CVars can be cumbersome, as they can only represent a single field element.
We might want to represent values that are either in a smaller range (e.g. booleans) or that are made out of several CVars.
For this, snarky’s API exposes the following trait, which allows users to define their own types:
#![allow(unused)] fn main() { pub trait SnarkyType<F>: Sized where F: PrimeField, { /// ? type Auxiliary; /// The equivalent type outside of the circuit. type OutOfCircuit; const SIZE_IN_FIELD_ELEMENTS: usize; fn to_cvars(&self) -> (Vec<CVar<F>>, Self::Auxiliary); fn from_cvars_unsafe(cvars: Vec<CVar<F>>, aux: Self::Auxiliary) -> Self; fn check(&self, cs: &mut RunState<F>); fn deserialize(&self) -> (Self::OutOfCircuit, Self::Auxiliary); fn serialize(out_of_circuit: Self::OutOfCircuit, aux: Self::Auxiliary) -> Self; fn constraint_system_auxiliary() -> Self::Auxiliary; fn value_to_field_elements(x: &Self::OutOfCircuit) -> (Vec<F>, Self::Auxiliary); fn value_of_field_elements(x: (Vec<F>, Self::Auxiliary)) -> Self::OutOfCircuit; } }
Such types are always handled as OutOfCircuit types (e.g. bool) by the users, and as a type implementing SnarkyType by snarky (e.g. Boolean).
Thus, the user can pass them to snarky in two ways:
As public inputs. In this case they will be serialized into field elements for snarky before witness-generation (via the value_to_field_elements() function)
As private inputs. In this case, they must be created using the compute() function with a closure returning an OutOfCircuit value by the user.
The call to compute() will need to have some type hint, for snarky to understand what SnarkyType it is creating.
This is because the relationship is currently only one-way: a SnarkyType knows what out-of-circuit type it relates to, but not the other way is not true.
(TODO: should we implement that though?)
A SnarkyType always implements a check() function, which is called by snarky when compute() is called to create such a type.
The check() function is responsible for creating the constraints that sanitize the newly-created SnarkyType (and its underlying CVars).
For example, creating a boolean would make sure that the underlying CVar is either 0 or 1.
Booleans
Booleans are a good example of a snarky variable.
#![allow(unused)] fn main() { pub struct Boolean<F: PrimeField>(CVar<F>); impl<F> SnarkyType<F> for Boolean<F> where F: PrimeField, { type Auxiliary = (); type OutOfCircuit = bool; const SIZE_IN_FIELD_ELEMENTS: usize = 1; fn to_cvars(&self) -> (Vec<CVar<F>>, Self::Auxiliary) { (vec![self.0.clone()], ()) } fn from_cvars_unsafe(cvars: Vec<CVar<F>>, _aux: Self::Auxiliary) -> Self { assert_eq!(cvars.len(), Self::SIZE_IN_FIELD_ELEMENTS); Self(cvars[0].clone()) } fn check(&self, cs: &mut RunState<F>) { // TODO: annotation? cs.assert_(Some("boolean check"), vec![BasicSnarkyConstraint::Boolean(self.0.clone())]); } fn deserialize(&self) -> (Self::OutOfCircuit, Self::Auxiliary) { todo!() } fn serialize(out_of_circuit: Self::OutOfCircuit, aux: Self::Auxiliary) -> Self { todo!() } fn constraint_system_auxiliary() -> Self::Auxiliary { todo!() } fn value_to_field_elements(x: &Self::OutOfCircuit) -> (Vec<F>, Self::Auxiliary) { todo!() } fn value_of_field_elements(x: (Vec<F>, Self::Auxiliary)) -> Self::OutOfCircuit { todo!() } } }
Check
The check() function is simply constraining the CVar to be either or using the following constraint:
It is trivial to use the double generic gate for this.
And
Not
Or
Circuit generation
In circuit generation mode, the has_witness field of RunState is set to the default CircuitGeneration, and the program of the user is ran to completion.
During the execution, the different snarky functions called on RunState will create internal variables as well as constraints.
Creation of variables
Variables can be created via the compute() function, which takes two arguments:
- A
TypeCreationtoggle, which is either set toCheckedorUnsafe. We will describe this below. - A closure representing the actual computation behind the variable. This computation will only take place when real values are computed, and can be non-deterministic (e.g. random, or external values provided by the user). Note that the closure takes one argument: a
WitnessGeneration, a structure that allows you to read the runtime values of any variables that were previously created in your program.
The compute() function also needs a type hint to understand what type of snarky type it is creating.
It then performs the following steps:
- creates enough
CVarto hold the value to be created - retrieves the auxiliary data needed to create the snarky type (TODO: explain auxiliary data) and create the
snarky variableout of theCVars and the auxiliary data - if the
TypeCreationis set toChecked, call thecheck()function on the snarky type (which will constrain the value created), if it is set toUnsafedo nothing (in which case we’re trusting that the value cannot be malformed, this is mostly used internally and it is highly-likely that users directly making use ofUnsafeare writing bugs)
At this point we only created variables to hold future values, and made sure that they are constrained.
The actual values will fill the room created by the CVar only during the witness generation.
Constraints
All other functions exposed by the API are basically here to operate on variables and create constraints in doing so.
Witness generation
In snarky, currently, the same code is run through again to generate the witness.
That is, the RunState contains a few changes:
public_input: Vec<F>: now contains concrete values (instead of being empty).has_witness: is set toWitnessGeneration.
Additionaly, if we want to verify that the arguments are actually correct (and that the program implemented does not fail) we can also set eval_constraints to true (defaults to false) to verify that the program has a correct state at all point in time.
If we do not do this, the user will only detect failure during proof generation (specifically when the composition polynomial is divided by the vanishing polynomial).
This is implemented by simply checking that each generic gate encountered is correct, in relation to the witness values observed in that row. In other words (extrapolated to the double generic gate). Note that other custom gates are not checked, as they are wrapped by gadgets which fill in witness values instead of the user. Thus there is no room for user error (i.e. the user entering a wrong private input).
Due to the has_witness variable set to WitnessGeneration, functions will behave differently and compute actual values instead of generating constraints.
Inductive proof systems
Earlier we defined zero-knowledge proofs as being being proofs of a computation of a function .
We will now go beyond this, and try to define zero-knowledge proof systems for computations that proceed inductively. That is, in pieces, and potentially over different locations involving different parties in a distributed manner.
An example of an inductive computation would be the verification of the Mina blockchain. Each block producer, when they produce a new block,
-
verifies that previous state of the chain was arrived at correctly
-
verifies that their VRF evaluation is sufficient to extend the chain
-
verifies a transaction proof, corresponding to the correctness of executing a bundle of Mina transactions
You can imagine the whole computation of verifying the chain, from a global view as being chopped up into per-block steps, and then each step is executed by a different block producer, relying on information which is private to that block-producer (for example, the private key needed to evaluate their VRF, which is never broadcast on the network).
But, at the end of the day, we end up with exactly one proof which summarizes this entire computation.
That is what inductive SNARKs (or in my opinion less evocatively recursive SNARKs, or proof-carrying data) allow us to do: create a single proof certifying the correctness of a big computation that occured in steps, possibly across multiple parties, and possibly with party-local private information.
Ok, so what are inductive SNARKs? Well, first let’s describe precisely the aforementioned class of distributed computations.
Inductive sets
zk-SNARKs as defined earlier allow you to prove for efficiently computable functions statements of the form
I know such that
Another way of looking at this is that they let you prove membership in sets of the form
These are called NP sets. In intuitive terms, an NP set is one in which membership can be efficiently checked given some “witness” or helper information (which is ).
Inductive proof systems let you prove membership in sets that are inductively defined. An inductively defined set is one where membership can be efficiently checked given some helper information, but the computation is explicitly segmented into pieces.
Let’s give the definition first and then I will link to a blog post that discusses an example.
We will give a recursive definition of a few concepts. Making this mathematically well-founded would require a little bit of work which is sort of just bureaucratic, so we will not do it. The concepts we’ll define recursively are
-
inductive set
-
inductive rule
The data of an inductive rule for a type is
-
a sequence of inductive sets (note the recursive reference to )
-
a type
-
a function . So this function is like our function for NP sets, but it also takes in the previous values from other inductive sets.
The data of an inductive set over a type is
- a sequence of inductive rules for the type ,
The subset of corresponding to (which we will for now write ) is defined inductively as follows.
For , if and only if
-
there is some inductive rule in the sequence with function such that
-
there exists and 2 for each such that
-
Actually there is a distinction between an inductive set , the type underlying it, and the subset of that type which belongs to that set. But it is messy to explicitly keep track of this distinction for the most part so we will equivocate between the 3 concepts.
Technical note which is fine to ignore, would be more appropriately defined by saying “There exists some efficiently computed ”.
Here the notion of membership in an inductive set is recursively referred to.
See this blog post (TODO: copy content of blogpost here)
Accumulation
Introduction
The trick below was originally described in Halo, however we are going to base this post on the abstraction of “accumulation schemes” described by Bünz, Chiesa, Mishra and Spooner in Proof-Carrying Data from Accumulation Schemes, in particular the scheme in Appendix A. 2.
Relevant resources include:
- Proof-Carrying Data from Accumulation Schemes by Benedikt Bünz, Alessandro Chiesa, Pratyush Mishra and Nicholas Spooner.
- Recursive Proof Composition without a Trusted Setup (Halo) by Sean Bowe, Jack Grigg and Daira Hopwood.
This page describes the most relevant parts of these papers and how it is implemented in Pickles/Kimchi. It is not meant to document the low-level details of the code in Pickles, but to describe what the code aims to do, allowing someone reviewing / working on this part of the codebase to gain context.
Interactive Reductions Between Relations
The easiest way to understand “accumulation schemes” is as a set of interactive reductions between relations. An interactive reduction proceeds as follows:
- The prover/verifier starts with some statement , the prover additionally holds .
- They then run some protocol between them.
- After which, they both obtain and the prover obtains
Pictorially:
With the security/completeness guarantee that:
Except with negligible probability.
In other words: we have reduced membership of to membership of using interaction between the parties: unlike a classical Karp-Levin reduction the soundness/completeness may rely on random coins and multiple rounds. Foreshadowing here is a diagram/overview of the reductions (the relations will be described as we go) used in Pickles:
As you can see from Fig. 2, we have a cycle of reductions (following the arrows) e.g. we can reduce a relation “” to itself by applying all 4 reductions. This may seem useless: why reduce a relation to itself?
However the crucial point is the “in-degree” (e.g. n-to-1) of these reductions: take a look at the diagram and note that any number of instances can be reduced to a single instance! This instance can then be converted to a single by applying the reductions (moving “around the diagram” by running one protocol after the other):
Note: There are many examples of interactive reductions, an example familiar to the reader is PlonK itself: which reduces circuit-satisfiability ( is the public inputs and is the wire assignments) to openings of polynomial commitments ( are polynomial commitments and evaluation points, is the opening of the commitment).
More Theory/Reflections about Interactive Reductions (click to expand)
As noted in Compressed -Protocol Theory and Practical Application to Plug & Play Secure Algorithmics every Proof-of-Knowledge (PoK) with -rounds for a relation can instead be seen as a reduction to some relation with rounds as follows:
- Letting be the view of the verifier.
- be a ’th-round message which could make the verifier accept.
Hence the relation is the set of verifier views (except for the last round) and the last missing message which would make the verifier accept if sent.
This simple, yet beautiful, observation turns out to be extremely useful: rather than explicitly sending the last-round message (which may be large/take the verifier a long time to check), the prover can instead prove that he knows a last-round message which would make the verifier accept, after all, sending the witness is a particularly simple/inefficient PoK for .
The reductions in this document are all of this form (including the folding argument): receiving/verifying the last-round message would require too many resources (time/communication) of the verifier, hence we instead replace it with yet another reduction to yet another language (e.g. where the witness is now half the size).
Hence we end up with a chain of reductions: going from the languages of the last-round messages. An “accumulation scheme” is just an example of such a chain of reductions which happens to be a cycle. Meaning the language is “self-reducing” via a series of interactive reductions.
A Note On Fiat-Shamir: All the protocols described here are public coin and hence in implementation the Fiat-Shamir transform is applied to avoid interaction: the verifiers challenges are sampled using a hash function (e.g. Poseidon) modelled as a reprogrammable random oracle.
Polynomial Commitment Openings
Recall that the polynomial commitment scheme (PCS) in Kimchi is just the trivial scheme based on Pedersen commitments. For Kimchi we are interested in “accumulation for the language () of polynomial commitment openings”, meaning that:
Where is a list of coefficients for a polynomial .
This is the language we are interested in reducing: providing a trivial proof, i.e. sending requires linear communication and time of the verifier, we want a poly-log verifier. The communication complexity will be solved by a well-known folding argument, however to reduce computation we are going to need the “Halo trick”.
First a reduction from PCS to an inner product relation.
Reduction:
Formally the relation of the inner product argument is:
We can reduce to with as follows:
- Define , so that ,
- The verifier adds the evaluation to the commitment “in a new coordinate” as follows:
- Verifier picks and sends to the prover.
- Verifier updates
Intuitively we sample a fresh to avoid a malicious prover “putting something in the -position”, because he must send before seeing , hence he would need to guess before-hand. If the prover is honest, we should have a commitment of the form:
Note: In some variants of this reduction is chosen as for a constant where by the verifier, this also works, however we (in Kimchi) simply hash to the curve to sample .
Reduction:
Note: The folding argument described below is the particular variant implemented in Kimchi, although some of the variable names are different.
The folding argument reduces a inner product with (a power of two) coefficients to an inner product relation with coefficients. To see how it works let us rewrite the inner product in terms of a first and second part:
Where and , similarly for .
Now consider a “randomized version” with a challenge of this inner product:
Additional intuition: How do you arrive at the expression above? (click to expand)
The trick is to ensure that ends up in the same power of .
The particular expression above is not special and arguably not the most elegant: simpler alternatives can easily be found and the inversion can be avoided, e.g. by instead using: Which will have the same overall effect of isolating the interesting term (this time as the -coefficient). The particular variant above can be found in e.g. Compressed -Protocol Theory and Practical Application to Plug & Play Secure Algorithmics and proving extraction is somewhat easier than the variant used in Kimchi.
The term we care about (underlined in magenta) is , the other two terms are cross-term garbage. The solution is to let the prover provide commitments to the cross terms to “correct” this randomized splitting of the inner product before seeing : the prover commits to the three terms (one of which is already provided) and the verifier computes a commitment to the new randomized inner product. i.e.
The prover sends commitment to and cross terms:
The verifier samples and defines:
The final observation in the folding argument is simply that:
Hence we can replace occurrences of by , with this look at the green term:
By defining . We also rewrite the blue term in terms of similarly:
By defining . In summary by computing:
We obtain a new instance of the inner product relation (of half the size):
At this point the prover could send , to the verifier who could verify the claim:
- Computing from and
- Computing from , and
- Checking
This would require half as much communication as the naive proof. A modest improvement…
However, we can iteratively apply this transformation until we reach an instance of constant size:
Reduction:
That the process above can simply be applied again to the new instance as well. By doing so times the total communication is brought down to -elements until the instance consists of at which point the prover simply provides the witness .
Because we need to refer to the terms in the intermediate reductions we let , , be the , , vectors respectively after recursive applications, with , , being the original instance. We denote by the challenge of the ’th application.
Reduction:
While the proof for above has -size, the verifiers time-complexity is :
- Computing from using takes .
- Computing from using takes .
The rest of the verifiers computation is only , namely computing:
- Sampling all the challenges .
- Computing for every
However, upon inspection, the naive claim that computing takes turns out not to be true:
Claim: Define , then for all .
Proof: This can be verified by looking at the expansion of . In slightly more detail: an equivalent claim is that where . Let be the bit-decomposition of the index and observe that: Which is simply a special case of the binomial theorem for the product:
Looking at it can clearly can be evaluated in time, computing therefore takes just time!
The “Halo Trick”
The “Halo trick” resides in observing that this is also the case for : since it is folded the same way as . It is not hard to convince one-self (using the same type of argument as above) that:
Where is the coefficients of (like is the coefficients of ), i.e.
For notational convince (and to emphasise that they are 1 dimensional vectors), define/replace:
With this we define the “accumulator language” which states that “” was computed correctly:
Note: since there is no witness for this relation anyone can verify the relation (in time) by simply computing in linear time. Instances are also small: the size is dominated by which is .
In The Code: in the Kimchi code is called prev_challenges and is called comm,
all defined in the RecursionChallenge struct.
Now, using the new notation rewrite as:
Note: It is the same relation, we just replaced some names (, ) and simplified a bit: inner products between 1-dimensional vectors are just multiplications.
We now have all the components to reduce (with no soundness error) as follows:
- Prover sends to the verifier.
- Verifier does:
- Compute
- Checks
- Output
Note: The above can be optimized, in particular there is no need for the prover to send .
Reduction:
Tying the final knot in the diagram.
The expensive part in checking consists in computing given the describing : first expanding into , then computing the MSM. However, by observing that is actually a polynomial commitment to , which we can evaluate at any point using operations, we arrive at a simple strategy for reducing any number of such claims to a single polynomial commitment opening:
- Prover sends to the verifier.
- Verifier samples , and computes:
And outputs the following claim:
i.e. the polynomial commitment opens to at . The prover has the witness:
Why is this a sound reduction: if one of the does not commit to then they disagree except on at most points, hence with probability . Taking a union bound over all terms leads to soundness error – negligible.
The reduction above requires operations and operations.
Addition of Polynomial Relations: additional polynomial commitments (i.e. from PlonK) can be added to the randomized sums above and opened at as well: in which case the prover proves the claimed openings at before sampling the challenge .
This is done in Kimchi/Pickles: the and above is the same as in the Kimchi code.
The combined (including both the evaluations and polynomial commitment openings at and ) is called combined_inner_product in Kimchi.
This instance reduced back into a single instance, which is included with the proof.
Multiple Accumulators (the case of PCD): From the section above it may seem like there is always going to be a single instance, this is indeed the case if the proof only verifies a single proof, “Incremental Verifiable Computation” (IVC) in the literature. If the proof verifies multiple proofs, “Proof-Carrying Data” (PCD), then there will be multiple accumulators: every “input proof” includes an accumulator ( instance), all these are combined into the new (single) instance included in the new proof: this way, if one of the original proofs included an invalid accumulator and therefore did not verify, the new proof will also include an invalid accumulator and not verify with overwhelming probability.
Note that the new proof contains the (single) accumulator of each “input” proof, even though the proofs themselves are part of the witness: this is because verifying each input proof results in an accumulator (which could then be checked directly – however this is expensive): the “result” of verifying a proof is an accumulator (instance of ) – which can be verified directly or further “accumulated”.
These accumulators are the RecursionChallenge structs included in a Kimchi proof.
The verifier check the PlonK proof (which proves accumulation for each “input proof”), this results in some number of polynomial relations,
these are combined with the accumulators for the “input” proofs to produce the new accumulator.
Accumulation Verifier
The section above implicitly describes the work the verifier must do, but for the sake of completeness let us explicitly describe what the verifier must do to verify a Fiat-Shamir compiled proof of the transformations above. This constitutes “the accumulation” verifier which must be implemented “in-circuit” (in addition to the “Kimchi verifier”), the section below also describes how to incorporate the additional evaluation point (“the step”, used by Kimchi to enforce constraints between adjacent rows). Let be the challenge space (128-bit GLV decomposed challenges):
-
PlonK verifier on outputs polynomial relations (in Purple in Fig. 4).
-
Checking and polynomial relations (from PlonK) to (the dotted arrows):
-
Sample (evaluation point) using the Poseidon sponge.
-
Read claimed evaluations at and (
ProofEvaluations). -
Sample (commitment combination challenge) using the Poseidon sponge.
-
Sample (evaluation combination challenge) using the Poseidon sponge.
-
Compute with from:
- (
RecursionChallenge.comm) - Polynomial commitments from PlonK (
ProverCommitments)
- (
-
Compute (part of
combined_inner_product) with from:- The evaluations of
- Polynomial openings from PlonK (
ProofEvaluations) at
-
Compute (part of
combined_inner_product) with from:- The evaluations of
- Polynomial openings from PlonK (
ProofEvaluations) at
At this point we have two PCS claims, these are combined in the next transform.
At this point we have two claims: These are combined using a random linear combination with in the inner product argument (see [Different functionalities](/plonk/inner_product_api.html) for details).
-
-
Checking .
- Sample using the Poseidon sponge: hash to curve.
- Compute .
- Update .
-
Checking :
Check the correctness of the folding argument, for every :- Receive (see the vector
OpeningProof.lrin Kimchi). - Sample using the Poseidon sponge.
- Compute (using GLV endomorphism)
(Note: to avoid the inversion the element is witnessed and the verifier checks . To understand why computing the field inversion would be expensive see deferred computation)
- Receive (see the vector
-
Checking
- Receive form the prover.
- Define from (folding challenges, computed above).
- Compute , this works since: See Different functionalities for more details or the relevant code.
- Compute (i.e. st. )
Note that the accumulator verifier must be proven (in addition to the Kimchi/PlonK verifier) for each input proof.
No Cycles of Curves?
Note that the “cycles of curves” (e.g. Pasta cycle) does not show up in this part of the code:
a separate accumulator is needed for each curve and the final verifier must check both accumulators to deem the combined recursive proof valid.
This takes the form of passthough data in pickles.
Note however, that the accumulation verifier makes use of both -operations and -operations, therefore it (like the Kimchi verifier) also requires deferred computation.
Deferred Computation
Let and be the two fields, with , for Elliptic curves and . Assume . We have a proof system (Kimchi) over and , where commitments to public inputs are:
Respectively. See Pasta Curves for more details.
When referring to the -side we mean the proof system for circuit over the field .
Public Inputs / Why Passing
In pickles-rs we have the notion of “passing” a variable (including the transcript) from one side to the other. e.g. when a field element needs to be used as a scalar on .
This document explains what goes on “under the hood”. Let us start by understanding why:
Let be a scalar which we want to use to do both:
- Field arithmetic in
- Scalar operations on
In order to do so efficiently, we need to split these operations across two circuits (and therefore proofs) because:
- Emulating arithmetic in is very expensive, e.g. computing requires multiplications over : 100’s of gates for a single multiplication.
- Since we cannot compute over efficiently, because, like before, emulating arithmetic in is very expensive…
Solution
The solution is to “pass” a value between the two proofs, in other words to have two values and which are equal as integers i.e. : they represent “the same number”. A naive first attempt would be to simply add to the witness on the -side, however this has two problems:
Insufficient Field Size: hence cannot fit in .
No Binding: More concerning, there is no binding between the in the -witness and the in the -witness: a malicious prover could choose completely unreleated values. This violates soundness of the overall -relation being proved.
Problem 1: Decompose
The solution to the first problem is simple:
In the -side decompose with (high bits) and (low bit). Note since ; always the case for any cycle of curves, is only smaller than , by Hasse. Now “” is “represented” by the two values .
Note that no decomposition is nessary if the “original value” was in , since is big enough to hold the lift of any element in .
Problem 2: Compute Commitment to the Public Input of other side
To solve the binding issue we will add to the public inputs on the -side, for simplicity we will describe the case where are the only public inputs in the -side, which means that the commitment to the public inputs on the side is:
At this point it is important to note that is defined over !
Which means that we can compute efficiently on the -side!
Therefore to enforce the binding, we:
- Add a sub-circuit which checks:
- Add to the public input on the -side.
We recurse onwards…
At this point the statement of the proof in -side is: the -proof is sound, condition on providing an opening of that satisifies the -relation.
At this point you can stop and verify the proof (in the case of a “step proof” you would), by recomputing outside the circuit while checking the -relation manually “in the clear”.
However, when recursing (e.g. wrapping in a “wrap proof”) we need to “ingest” this public input ; after all, to avoid blowup in the proof size everything (proofs/accumulators/public inputs etc.) must eventually become part of the witness and every computation covered by a circuit…
To this end, the wrap proof is a proof for the -relation with the public input which additionally verifies the -proof.
The next “step” proof then verifies this wrap proof which means that then becomes part of the witness!
In Pickles
We can arbitrarily choose which side should compute the public input of the other, in pickles we let “wrap” compute the commitment to the public input.
Enforcing Equality
Enforces that the public input of the proof verified on the Fr side is equal to the Fp input computed on Fr side.
Passthrough and Me-Only
A More Efficient Approach to Zero Knowledge for PLONK
In PLONK (considered as an interactive oracle proof), the prover sends the verifier several polynomials. They are evaluated at some points during the course of the protocol. Of course, if we want zero-knowledge, we would require that those evaluations do not reveal anything about the proof’s underlying witness.
PLONK as described here achieves zero knowledge by multiplying the polynomials with a small degree polynomial of random coefficients. When PLONK is instantiated with a discrete-log base Bootle et al type polynomial commitment scheme, the polynomial degrees must be padded to the nearest power of two. As a result, since several of the polynomials in PLONK already have degree equal to a power of two before the zero-knowledge masking, the multiplication with a random polynomial pushes the degree to the next power of two, which hurts efficiency. In order to avoid it, we propose an alternative method of achieving zero knowledge.
Zero Knowledge for the Column Polynomials
Let be the number of rows in a PLONK constraint system. For a typical real-world circuit, will not be equal to a power of two.
Let the witness elements from one column be . Let be the closest power of two to such that . Let be the field that witness elements belong to.
Now, in vanilla PLONK, we pad the with elements, interpolate the polynomial over a domain of size , scale it by a low degree random polynomial, and commit to the resulting polynomial. We want to avoid increasing the degree with the scaling by a low degree polynomial, so consider the following procedure.
Procedure. Sample elements uniformly from : . Append them to the tuple of witness elements and then pad the remaining places as zeroes. The resulting tuple is interpolated as the witness polynomial. This approach ensures zero knowledge for the witness polynomials as established by Lemma 1.
Lemma 1. Let be a domain of size . Let . Let be uniformly and independently random elements in Let be the -tuple . Let be an interpolation polynomial of degree such that , where . Let be any elements in such that for every . Then, is distributed uniformly at random in .
Proof sketch. Recall that the interpolation polynomial is
With as random variables, we have, for some constant field elements . Therefore, assigning random values to will give degrees of freedom that will let to be distributed uniformly at random in .
Zero Knowledge for the Permutation Polynomial
The other polynomial in PLONK for which we need zero-knowledge is the “permutation polynomial” . The idea here is to set the last evaluations to be uniformly random elements in . Then, we’ll modify the verification equation to not check for those values to satisfy the permutation property.
Modified permutation polynomial. Specifically, set as follows.
From Lemma 1, the above has the desired zero knowledge property when evaluations are revealed. However, we need to modify the other parts of the protocol so that the last elements are not subject to the permutation evaluation, since they will no longer satisfy the permutation check. Specifically, we will need to modify the permutation polynomial to disregard those random elements, as follows.
Modified permutation checks. To recall, the permutation check was originally as follows. For all ,
The modified permuation checks that ensures that the check is performed only on all the values except the last elements in the witness polynomials are as follows.
-
For all ,
-
For all ,
-
For all ,
In the modified permutation polynomial above, the multiple ensures that the permutation check is performed only on all the values except the last elements in the witness polynomials.
Understanding the implementation of the check
Unlike the latest version of vanilla PLONK that implements the the final check using a polynomial opening (via Maller’s optimization), we implement it manually. (That is to say, Izaak implemented Maller’s optimization for 5-wires.)
But the check is not exactly . This note describes how and why the implementation deviates a little.
What’s f and what’s missing in the final equation?
If you look at how the evaluation of is computed on the prover side (or the commitment of is computed on the verifier side), you can see that f is missing two things:
- the public input part
- some terms in the permutation
What is it missing in the permutation part? Let’s look more closely. This is what we have:
In comparison, here is the list of the stuff we needed to have:
- the two sides of the coin: with
- and
- the initialization:
- and the end of the accumulator:
You can read more about why it looks like that in this post.
We can see clearly that the permutation stuff we have in f is clearly lacking. It’s just the subtraction part (equation 2), and even that is missing some terms. It is missing exactly this:
So at the end, when we have to check for the identity we’ll actually have to check something like this (I colored the missing parts on the left hand side of the equation):
This is not exactly what happens in the code. But if we move things around a bit, we end up with what’s implemented. I highlight what changes in each steps. First, we just move things around:
here are the actual lagrange basis calculated with the formula here, oh and we actually use in the code, not , so let’s change that as well:
finally we extract some terms from the lagrange basis:
with
Why do we do things this way? Most likely to reduce
Also, about the code:
- the division by in the and the terms is missing (why?)
- the multiplication by in the term is missing (why?)
As these terms are constants, and do not prevent the division by to form , also omits them. In other word, they cancel one another.
What about ?
In verifier.rs you can see the following code (cleaned to remove anything not important)
#![allow(unused)] fn main() { // compute the witness polynomials $w_0, \cdots, w_14$ and the permutation polynomial $z$ in evaluation forms on different domains let lagrange = index.cs.evaluate(&w, &z); // compute parts of the permutation stuff that is included in t let (perm, bnd) = index.cs.perm_quot(&lagrange, &oracles, &z, &alpha[range::PERM])?; // divide contributions with vanishing polynomial let (mut t, res) = (perm + some_other_stuff).divide_by_vanishing_poly() // add the other permutation stuff t += &bnd; // t is evaluated at zeta and sent... }
Notice that bnd is not divided by the vanishing polynomial. Also what’s perm? Let’s answer the latter:
and bnd is:
you can see that some terms are missing in bnd, specifically the numerator . Well, it would have been cancelled by the division by the vanishing polynomial, and this is the reason why we didn’t divide that term by .
Also, note that the same constant terms that were missing in are missing in .
Maller’s optimization for kimchi
This document proposes a protocol change for kimchi.
What is Maller’s optimization?
See the section on Maller’s optimization for background.
Overview
We want the verifier to form commitment to the following polynomial:
They could do this like so:
Since they already know , they can produce , the only thing they need is . So the protocol looks like that:

In the rest of this document, we review the details and considerations needed to implement this change in kimchi.
How to deal with a chunked ?
There’s one challenge that prevents us from directly using this approach: is typically sent and received in several commitments (called chunks or segments throughout the codebase). As is the largest polynomial, usually exceeding the size of the SRS (which is by default set to be the size of the domain).
The verifier side
Thus, the verifier will have to produce split commitments of and combine them with powers of to verify an evaluation proof. Let’s define as:
where every is of degree at most. Then we have that
Which the verifier can’t produce because of the powers of , but we can linearize it as we already know which we’re going to evaluate that polynomial with:
The prover side
This means that the prover will produce evaluation proofs on the following linearized polynomial:
which is the same as only if evaluated at . As the previous section pointed out, we will need and .
Evaluation proof and blinding factors
To create an evaluation proof, the prover also needs to produce the blinding factor associated with the chunked commitment of:
To compute it, there are two rules to follow:
- when two commitment are added together, their associated blinding factors get added as well:
- when scaling a commitment, its blinding factor gets scalled too:
As such, if we know and , we can compute:
The prover knows the blinding factor of the commitment to , as they committed to it. But what about ? They never commit to it really, and the verifier recreates it from scratch using:
- The commitments we sent them. In the linearization process, the verifier actually gets rid of most prover commitments, except for the commitment to the last sigma commitment . (TODO: link to the relevant part in the spec) As this commitment is public, it is not blinded.
- The public commitments. Think commitment to selector polynomials or the public input polynomial. These commitments are not blinded, and thus do not impact the calculation of the blinding factor.
- The evaluations we sent them. Instead of using commitments to the wires when recreating , the verifier uses the (verified) evaluations of these in . If we scale our commitment with any of these scalars, we will have to do the same with .
Thus, the blinding factor of is .
The evaluation of
The prover actually does not send a commitment to the full polynomial. As described in the last check section. The verifier will have to compute the evaluation of because it won’t be zero. It should be equal to the following:
Because we use the inner product polynomial commitment, we also need:
Notice the . That evaluation must be sent as part of the proof as well.
The actual protocol changes
Now here’s how we need to modify the current protocol:
- The evaluations (and their corresponding evaluation proofs) don’t have to be part of the protocol anymore.
- The prover must still send the chunked commitments to .
- The prover must create a linearized polynomial by creating a linearized polynomial and a linearized polynomial and computing:
- While the verifier can compute the evaluation of by themselves, they don’t know the evaluation of , so the prover needs to send that.
- The verifier must recreate , the commitment to , themselves so that they can verify the evaluation proofs of both and .
- The evaluation of must be absorbed in both sponges (Fq and Fr).

The proposal is implemented in #150 with the following details:
- the polynomial is called .
- the evaluation of is called .
- the evaluation is called
RFC: Plookup in kimchi
In 2020, plookup showed how to create lookup proofs. Proofs that some witness values are part of a lookup table. Two years later, an independent team published plonkup showing how to integrate Plookup into Plonk.
This document specifies how we integrate plookup in kimchi. It assumes that the reader understands the basics behind plookup.
Overview
We integrate plookup in kimchi with the following differences:
- we snake-ify the sorted table instead of wrapping it around (see later)
- we allow fixed-ahead-of-time linear combinations of columns of the queries we make
- we only use a single table (XOR) at the moment of this writing
- we allow several lookups (or queries) to be performed within the same row
- zero-knowledgeness is added in a specific way (see later)
The following document explains the protocol in more detail
Recap on the grand product argument of plookup
As per the Plookup paper, the prover will have to compute three vectors:
- , the (secret) query vector, containing the witness values that the prover wants to prove are part of the lookup table.
- , the (public) lookup table.
- , the (secret) concatenation of and , sorted by (where elements are listed in the order they are listed in ).
Essentially, plookup proves that all the elements in are indeed in the lookup table if and only if the following multisets are equal:
where is a new set derived by applying a “randomized difference” between every successive pairs of a vector. For example:
Note: This assumes that the lookup table is a single column. You will see in the next section how to address lookup tables with more than one column.
The equality between the multisets can be proved with the permutation argument of plonk, which would look like enforcing constraints on the following accumulator:
- init:
- final:
- for every :
Note that the plookup paper uses a slightly different equation to make the proof work. I believe the proof would work with the above equation, but for simplicity let’s just use the equation published in plookup:
Note: in plookup is too large, and so needs to be split into multiple vectors to enforce the constraint at every . We ignore this for now.
Lookup tables
Kimchi uses a single lookup table at the moment of this writing; the XOR table. The XOR table for values of 1 bit is the following:
| l | r | o |
|---|---|---|
| 1 | 0 | 1 |
| 0 | 1 | 1 |
| 1 | 1 | 0 |
| 0 | 0 | 0 |
Whereas kimchi uses the XOR table for values of 4 bits, which has entries.
Note: the (0, 0, 0) entry is at the very end on purpose (as it will be used as dummy entry for rows of the witness that don’t care about lookups).
Querying the table
The plookup paper handles a vector of lookups which we do not have. So the first step is to create such a table from the witness columns (or registers). To do this, we define the following objects:
- a query tells us what registers, in what order, and scaled by how much, are part of a query
- a query selector tells us which rows are using the query. It is pretty much the same as a gate selector.
Let’s go over the first item in this section.
For example, the following query tells us that we want to check if
| l | r | o |
|---|---|---|
| 1, r0 | 1, r2 | 2, r1 |
The grand product argument for the lookup consraint will look like this at this point:
Not all rows need to perform queries into a lookup table. We will use a query selector in the next section to make the constraints work with this in mind.
Query selector
The associated query selector tells us on which rows the query into the XOR lookup table occurs.
| row | query selector |
|---|---|
| 0 | 1 |
| 1 | 0 |
Both the (XOR) lookup table and the query are built-ins in kimchi. The query selector is derived from the circuit at setup time. Currently only the ChaCha gates make use of the lookups.
The grand product argument for the lookup constraint looks like this now:
where is constructed so that a dummy query () is used on rows that don’t have a query.
Queries, not query
Since we allow multiple queries per row, we define multiple queries, where each query is associated with a lookup selector.
At the moment of this writing, the ChaCha gates all perform queries in a row. Thus, is trivially the largest number of queries that happen in a row.
Important: to make constraints work, this means that each row must make 4 queries. (Potentially some or all of them are dummy queries.)
For example, the ChaCha0, ChaCha1, and ChaCha2 gates will apply the following 4 XOR queries on the current and following rows:
| l | r | o | - | l | r | o | - | l | r | o | - | l | r | o |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1, r3 | 1, r7 | 1, r11 | - | 1, r4 | 1, r8 | 1, r12 | - | 1, r5 | 1, r9 | 1, r13 | - | 1, r6 | 1, r10 | 1, r14 |
which you can understand as checking for the current and following row that
The ChaChaFinal also performs (somewhat similar) queries in the XOR lookup table. In total this is 8 different queries that could be associated to 8 selector polynomials.
Grouping queries by queries pattern
Associating each query with a selector polynomial is not necessarily efficient. To summarize:
- the
ChaCha0,ChaCha1, andChaCha2gates that make queries into the XOR table - the
ChaChaFinalgate makes different queries into the XOR table
Using the previous section’s method, we’d have to use different lookup selector polynomials for each of the different queries. Since there’s only use-cases, we can simply group them by queries patterns to reduce the number of lookup selector polynomials to .
The grand product argument for the lookup constraint looks like this now:
where is constructed as:
where, for example the first pattern for the ChaCha0, ChaCha1, and ChaCha2 gates looks like this:
Note:
- there’s now 4 dummy queries, and they only appear when none of the lookup selectors are active
- if a pattern uses less than 4 queries, they’d have to pad their queries with dummy queries as well
Back to the grand product argument
There are two things that we haven’t touched on:
- The vector representing the combined lookup table (after its columns have been combined with a joint combiner ). The non-combined loookup table is fixed at setup time and derived based on the lookup tables used in the circuit (for now only one, the XOR lookup table, can be used in the circuit).
- The vector representing the sorted multiset of both the queries and the lookup table. This is created by the prover and sent as commitment to the verifier.
The first vector is quite straightforward to think about:
- if it is smaller than the domain (of size ), then we can repeat the last entry enough times to make the table of size .
- if it is larger than the domain, then we can either increase the domain or split the vector in two (or more) vectors. This is most likely what we will have to do to support multiple lookup tables later.
What about the second vector?
The sorted vector
The second vector is of size
That is, it contains the elements of each query vectors (the actual values being looked up, after being combined with the joint combinator, that’s per row), as well as the elements of our lookup table (after being combined as well).
Because the vector is larger than the domain size , it is split into several vectors of size . Specifically, in the plonkup paper, the two halves of (which are then interpolated as and ).
Since you must compute the difference of every contiguous pairs, the last element of the first half is the replicated as the first element of the second half (), and a separate constraint enforces that continuity on the interpolated polynomials and :
which is equivalent with checking that
The sorted vector in kimchi
Since this vector is known only by the prover, and is evaluated as part of the protocol, zero-knowledge must be added to the polynomial. To do this in kimchi, we use the same technique as with the other prover polynomials: we randomize the last evaluations (or rows, on the domain) of the polynomial.
This means two things for the lookup grand product argument:
- we cannot use the wrap around trick to make sure that the list is split in two correctly (enforced by which is equivalent to in the plookup paper)
- we have even less space to store an entire query vector. Which is actually super correct, as the witness also has some zero-knowledge rows at the end that should not be part of the queries anyway.
The first problem can be solved in two ways:
- Zig-zag technique. By reorganizing to alternate its values between the columns. For example, and so that you can simply write the denominator of the grand product argument as this is what the plonkup paper does.
- Snake technique. by reorganizing as a snake. This is what is done in kimchi currently.
The snake technique rearranges into the following shape:
_ _
| | | | |
| | | | |
|_| |_| |
so that the denominator becomes the following equation:
and the snake doing a U-turn is constrained via something like
If there’s an (because the table is very large, for example), then you’d have something like:
with the added U-turn constraint:
Unsorted in
Note that at setup time, cannot be sorted as it is not combined yet. Since needs to be sorted by (in other words, not sorted, but sorted following the elements of ), there are two solutions:
- both the prover and the verifier can sort the combined , so that can be sorted via the typical sorting algorithms
- the prover can sort by , so that the verifier doesn’t have to do any sorting and can just rely on the commitment of the columns of (which the prover can evaluate in the protocol).
We do the second one, but there is an edge-case: the combined entries can repeat. For some such that , we might have
For example, if and , then would be one way of sorting things out. But would be incorrect.
Recap
So to recap, to create the sorted polynomials , the prover:
- creates a large query vector which contains the concatenation of the per-row (combined with the joint combinator) queries (that might contain dummy queries) for all rows
- creates the (combined with the joint combinator) table vector
- sorts all of that into a big vector
- divides that vector into as many vectors as a necessary following the snake method
- interpolate these vectors into polynomials
- commit to them, and evaluate them as part of the protocol.
Foreign Field Addition RFC
This document is meant to explain the foreign field addition gate in Kimchi.
Overview
The goal of this gate is to perform the following operation between a left and a right input, to obtain a result where belong to the foreign field of modulus and we work over a native field with modulus , and is a flag that is either or to indicate whether it is a subtraction or addition gate.
If then we can easily perform the above computation. But in this gate we are interested in the contrary case in which . In order to deal with this, we will divide foreign field elements into limbs that fit in our native field. We want to be compatible with the foreign field multiplication gate, and thus the parameters we will be using are the following:
- 3 limbs of 88 bits each (for a total of 264 bits)
- our foreign field will have 256-bit length
- our native field has 255-bit length
In other words, using 3 limbs of 88 bits each allows us to represent any foreign field element in the range for foreign field addition, but only up to for foreign field multiplication. Thus, with the current configuration of our limbs, our foreign field must be smaller than (because , more on this in the FFmul RFC.
Splitting the addition
Let’s take a closer look at what we have, if we split our variables in limbs (using little endian)
bits 0..............87|88...........175|176...........263
a = (-------a0-------|-------a1-------|-------a2-------)
+/-
b = (-------b0-------|-------b1-------|-------b2-------)
=
r = (-------r0-------|-------r1-------|-------r2-------) mod(f)
We will perform the addition in 3 steps, one per limb. Now, when we split long additions in this way, we must be careful with carry bits between limbs. Also, even if and are foreign field elements, it could be the case that is already larger than the modulus (such addition could be at most ). But it could also be the case that the subtraction produces an underflow because (with a difference of at most ). Thus we will have to consider the more general case. That is,
with a field overflow term that will be either (if no underflow nor overflow is produced) or (if there is overflow with ) or (if there is underflow with ). Looking at this in limb form, we have:
bits 0..............87|88...........175|176...........263
a = (-------a0-------|-------a1-------|-------a2-------)
+
s = 1 | -1
·
b = (-------b0-------|-------b1-------|-------b2-------)
=
q = -1 | 0 | 1
·
f = (-------f0-------|-------f1-------|-------f2-------)
+
r = (-------r0-------|-------r1-------|-------r2-------)
First, if was larger than , then we will have a field overflow (represented by ) and we will have to subtract from the sum to obtain . Whereas the foreign field overflow necessitates an overflow bit for the foreign field equation above, when there is a corresponding subtraction that may introduce carries (or even borrows) between the limbs. This is because . Therefore, in the equations for the limbs we will use a carry flag for limb to represent both carries and borrows. The carry flags can be anything in , where represents a borrow and represents a carry. Next we see more clearly how this works.
In order to perform this operation in parts, we first take a look at the least significant limb, which is the easiest part. This means we want to know how to compute . First, if the addition of the bits in and produce a carry (or borrow) bit, then it should propagate to the second limb. That means one has to subtract from , add to and set the low carry flag to 1 (otherwise it is zero). Thus, the equation for the lowest bit is
Or put in another way, this is equivalent to saying that is a multiple of (or, the existence of the carry coefficient ).
This kind of equation needs an additional check that the carry coefficient is a correct carry value (0, 1, or -1). We will use this idea for the remaining limbs as well.
Looking at the second limb, we first need to observe that the addition of and can, not only produce a carry bit , but they may need to take into account the carry bit from the first limb; . Similarly to the above,
Note that in this case, the carry coefficient from the least significant limb is not multiplied by , but instead is used as is, since it occupies the least significant position of the second limb. Here, the second carry bit is the one being followed by zeros. Again, we will have to check that is a bit.
Finally, for the third limb, we obtain a similar equation. But in this case, we do not have to take into account anymore, since it was already considered within . Here, the most significant carry bit should always be a zero so we can ignore it.
Graphically, this is what is happening:
bits 0..............87|88...........175|176...........263
a = (-------a0-------|-------a1-------|-------a2-------)
+
s = 1 | -1
·
b = (-------b0-------|-------b1-------|-------b2-------)
> > >
= c_0 c_1 c_2
q = -1 |0 | 1
·
f = (-------f0-------|-------f1-------|-------f2-------)
+
r = (-------r0-------|-------r1-------|-------r2-------)
Our witness computation is currently using the BigUint library, which takes care of all of the intermediate carry limbs itself so we do not have to address all casuistries ourselves (such as propagating the low carry flag to the high limb in case the middle limb is all zeros).
Upper bound check
Last but not least, we should perform some range checks to make sure that the result is contained in . This is important because there could be other values of the result which still fit in but are larger than , and we must make sure that the final result is the minimum one (that we will be referring to as in the following).
Ideally, we would like to reuse some gates that we already have. In particular, we can perform range checks for . But we want to check that . The way we can tweak this gate to behave as we want, is the following. First, the above inequality is equivalent to saying that . Then we add on both sides to obtain . Thus, all there is left to check is that a given bound value is indeed computed correctly. Meaning, that an upperbound term is correctly obtained as . Even if we could apply this check for all intermediate results of foreign field additions, it is sufficient to apply it only once at the end of the computations.
This is very similar to a foreign field addition, but simpler: the field overflow bit will always be , the sign is positive , and the right input is formed by the limbs standing for the number . There could be intermediate limb carry bits and as above. Observe that, because the sum is meant to be , then the carry bit for the most significant limb should always be zero , so we do not use it. Happily, we can apply the addition gate again to perform the addition limb-wise, by selecting the following parameters:
Calling the upper bound term, the equation can be expressed as . Finally, we perform a range check on the sum , and we would know that .
bits 0..............87|88...........175|176...........263
r = (-------r0-------|-------r1-------|-------r2-------)
+
g = (-------g0-------|-------g1-------|-------g2-------)
> >
= k_0 k_1
f
+
u = (-------u0-------|-------u1-------|-------u2-------)
Following the steps above, and representing this equation in limb form, we have:
But now we also have to check that . But this is implicitly covered by being a field element of at most 264 bits (range check).
Subtractions
Mathematically speaking, a subtraction within a field is no more than an addition over that field. Negative elements are not different from “positive” elements in finite fields (or in any modular arithmetic). Our witness computation code computes negative sums by adding the modulus to the result. To give a general example, the element within a field of order and is nothing but . Nonetheless, for arbitrarily sized elements (not just those smaller than the modulus), the actual field element could be any , for any multiple of the modulus. Thus, representing negative elements directly as “absolute” field elements may incur in additional computations involving multiplications and thus would result in a less efficient mechanism.
Instead, our gate encodes subtractions and additions directly within the sign term that is multiplying the right input. This way, there is no need to check that the negated value is performed correctly (which would require an additional row for a potential FFNeg gate).
Optimization
So far, one can recompute the result as follows:
bits 0..............87|88...........175|176...........263
r = (-------r0-------|-------r1-------|-------r2-------)
=
a = (-------a0-------|-------a1-------|-------a2-------)
+
s = 1 | -1
·
b = (-------b0-------|-------b1-------|-------b2-------)
-
q = -1 |0 | 1
·
f = (-------f0-------|-------f1-------|-------f2-------)
> > >
c_0 c_1 0
Inspired by the halving approach in foreign field multiplication, an optimized version of the above gate results in a reduction by 2 in the number of constraints and by 1 in the number of witness cells needed. The main idea is to condense the claims about the low and middle limbs in one single larger limb of 176 bits, which fit in our native field. This way, we can get rid of the low carry flag, its corresponding carry check, and the three result checks become just two.
bits 0..............87|88...........175|176...........263
r = (-------r0------- -------r1-------|-------r2-------)
=
a = (-------a0------- -------a1-------|-------a2-------)
+
s = 1 | -1
·
b = (-------b0------- -------b1-------|-------b2-------)
-
q = -1 |0 | 1
·
f = (-------f0------- -------f1-------|-------f2-------)
> >
c 0
These are the new equations:
with and .
Gadget
The foreign field gadget will be composed by 4 sets of RangeCheck gadgets for witnesses accounting for 16 rows in total; followed by one row with ForeignFieldAdd gate type; a final ForeignFieldAdd with a Zero row. The first four rows constrain the range of the left input. The following four constrain the range of the right input. The next range check is for the result of the addition of left and right. Next, a final range check for the upper bound therm. Then the foreign field addition gate performs the actual addition. And the final foreign field addition gate (followed by a zero gate) takes care of the final upper bound operation. A total of 19 rows with 15 columns in Kimchi.
| Row(s) | Gate type(s) | Witness |
|---|---|---|
| 0-3 | multi-range-check | |
| 4-7 | multi-range-check | |
| 8-11 | multi-range-check | |
| 12-15 | multi-range-check | |
| 16 | ForeignFieldAdd | |
| 17 | ForeignFieldAdd | |
| 18 | Zero |
We have a mechanism to chain foreign field additions together. In this case, an initial left input range check is performed, which is followed by a pairs of range check gates for the right input and intermediate result (which become the left input for the next iteration). After the chained inputs checks, a final range check on the upper bound takes place. Then, there are foreign field addition gates, and finally a foreign field addition gate for the upper bound (whose current row corresponds to the next row of the last foreign field addition gate), and an auxiliary Zero row that holds the upper bound.
More formally, these are the rows for the chained gadget:
| Row(s) | Gate type(s) | Witness |
|---|---|---|
| 0..3 | multi-range-check for left | |
| 8i+4..8i+7 | multi-range-check for right | |
| 8i+8..8i+11 | multi-range-check for result | |
| 8n+4..8n+7 | multi-range-check for bound | |
| 8n+i+8 | ForeignFieldAdd | |
| 9n+8 | ForeignFieldAdd | |
| 9n+9 | Zero |
Layout
For this gate, we need to perform 4 range checks to assert that the limbs of have a correct size, meaning they fit in (and thus, range-checking for ). Because each of these elements is split into 3 limbs, we will have to use 3 copy constraints between the RangeCheck gates and the ForeignFieldAdd rows (per element). That amounts to 12 copy constraints. Recalling that Kimchi only allows for the first 7 columns of each row to host a copy constraint, we necessarily have to use 2 rows for the actual addition gate. The layout of these two rows is the following:
| Curr | Next | … Final | |
|---|---|---|---|
| Column | ForeignFieldAdd | ForeignFieldAdd | Zero |
| 0 | (copy) | (copy) | (copy) |
| 1 | (copy) | (copy) | (copy) |
| 2 | (copy) | (copy) | (copy) |
| 3 | (copy) | ||
| 4 | (copy) | ||
| 5 | (copy) | ||
| 6 | |||
| 7 | |||
| 8 | |||
| 9 | |||
| 10 | |||
| 11 | |||
| 12 | |||
| 13 | |||
| 14 |
Constraints
So far, we have pointed out the following sets of constraints:
Main addition
Carry checks
Sign checks
Overflow checks
Optimizations
When we use this gate as part of a larger chained gadget, we should optimize the number of range check rows to avoid redundant checks. In particular, if the result of an addition becomes one input of another addition, there is no need to check twice that the limbs of that term have the right length.
Foreign Field Multiplication RFC
This document explains how we constrain foreign field multiplication in Kimchi.
Changelog
| Author(s) | Date | Details |
|---|---|---|
| Joseph Spadavecchia and Anais Querol | October 2022 | Design of foreign field multiplication in Kimchi |
Overview
This gate constrains
where , a foreign field with modulus , using the native field with prime modulus .
Approach
In foreign field multiplication the foreign field modulus could be bigger or smaller than the native field modulus . When the foreign field modulus is bigger, then we need to emulate foreign field multiplication by splitting the foreign field elements up into limbs that fit into the native field element size. When the foreign modulus is smaller everything can be constrained either in the native field or split up into limbs.
Since our projected use cases are when the foreign field modulus is bigger (more on this below) we optimize our design around this scenario. For this case, not only must we split foreign field elements up into limbs that fit within the native field, but we must also operate in a space bigger than the foreign field. This is because we check the multiplication of two foreign field elements by constraining its quotient and remainder. That is, renaming to , we must constrain
where the maximum size of and is and so we have
Thus, the maximum size of the multiplication is quadratic in the size of foreign field.
Naïve approach
Naïvely, this implies that we have elements of size that must split them up into limbs of size at most . For example, if the foreign field modulus is and the native field modulus is bits, then we’d need bits and, thus, require native limbs. However, each limb cannot consume all bits of the native field element because we need space to perform arithmetic on the limbs themselves while constraining the foreign field multiplication. Therefore, we need to choose a limb size that leaves space for performing these computations.
Later in this document (see the section entitled “Choosing the limb configuration”) we determine the optimal number of limbs that reduces the number of rows and gates required to constrain foreign field multiplication. This results in bits as our optimal limb size. In the section about intermediate products we place some upperbounds on the number of bits required when constraining foreign field multiplication with limbs of size thereby proving that the computations can fit within the native field size.
Observe that by combining the naïve approach above with a limb size of bits, we would require limbs for representing foreign field elements. Each limb is stored in a witness cell (a native field element). However, since each limb is necessarily smaller than the native field element size, it must be copied to the range-check gate to constrain its value. Since Kimchi only supports 7 copyable witness cells per row, this means that only one foreign field element can be stored per row. This means a single foreign field multiplication would consume at least 4 rows (just for the operands, quotient and remainder). This is not ideal because we want to limit the number of rows for improved performance.
Chinese remainder theorem
Fortunately, we can do much better than this, however, by leveraging the chinese remainder theorem (CRT for short) as we will now show. The idea is to reduce the number of bits required by constraining our multiplication modulo two coprime moduli: and . By constraining the multiplication with both moduli the CRT guarantees that the constraints hold with the bigger modulus .
The advantage of this approach is that constraining with the native modulus is very fast, allowing us to speed up our non-native computations. This practically reduces the costs to constraining with limbs over , where is something much smaller than .
For this to work, we must select a value for that is big enough. That is, we select such that . As described later on in this document, by doing so we reduce the number of bits required for our use cases to . With bit limbs this means that each foreign field element only consumes witness elements, and that means the foreign field multiplication gate now only consumes rows. The section entitled “Choosing ” describes this in more detail.
Overall approach
Bringing it all together, our approach is to verify that
over . In order to do this efficiently we use the CRT, which means that the equation holds mod . For the equation to hold over the integers we must also check that each side of the equation is less than .
The overall steps are therefore
- Apply CRT to equation (1)
- Check validity with binary modulus
- Check validity with native modulus
- Check each side of equation (1) is less than
This then implies that
where . That is, multiplied with is equal to where . There is a lot more to each of these steps. That is the subject of the rest of this document.
Other strategies
Within our overall approach, aside from the CRT, we also use a number of other strategies to reduce the number and degree of constraints.
- Avoiding borrows and carries
- Intermediate product elimination
- Combining multiplications
The rest of this document describes each part in detail.
Parameter selection
This section describes important parameters that we require and how they are computed.
- Native field modulus
- Foreign field modulus
- Binary modulus
- CRT modulus
- Limb size in bits
Choosing
Under the hood, we constrain . Since this is a multiplication in the foreign field , the maximum size of and are , so this means
Therefore, we need the modulus such that
which is the same as saying, given and , we must select such that
Thus, we have an lower bound on .
Instead of dynamically selecting for every and combination, we fix a that will work for the different selections of and relevant to our use cases.
To guide us, from above we know that
and we know the field moduli for our immediate use cases.
vesta = 2^254 + 45560315531506369815346746415080538113 (255 bits)
pallas = 2^254 + 45560315531419706090280762371685220353 (255 bits)
secp256k1 = 2^256 - 2^32 - 2^9 - 2^8 - 2^7 - 2^6 - 2^4 - 1 (256 bits)
So we can create a table
vesta | secp256k1 | 258 |
pallas | secp256k1 | 258 |
vesta | pallas | 255 |
pallas | vesta | 255 |
and know that to cover our use cases we need .
Next, given our native modulus and , we can compute the maximum foreign field modulus supported. Actually, we compute the maximum supported bit length of the foreign field modulus .
So we get
With we have
which is not enough space to handle anything larger than 256 bit moduli. Instead, we will use , giving a maximum modulus bits.
Choosing the limb configuration
Choosing the right limb size and the right number of limbs is a careful balance between the number of constraints (i.e. the polynomial degree) and the witness length (i.e. the number of rows). Because one limiting factor that we have in Kimchi is the 12-bit maximum for range check lookups, we could be tempted to introduce 12-bit limbs. However, this would mean having many more limbs, which would consume more witness elements and require significantly more rows. It would also increase the polynomial degree by increasing the number of constraints required for the intermediate products (more on this later).
We need to find a balance between the number of limbs and the size of the limbs. The limb configuration is dependent on the value of and our maximum foreign modulus (as described in the previous section). The larger the maximum foreign modulus, the more witness rows we will require to constrain the computation. In particular, each limb needs to be constrained by the range check gate and, thus, must be in a copyable (i.e. permuteable) witness cell. We have at most 7 copyable cells per row and gates can operate on at most 2 rows, meaning that we have an upperbound of at most 14 limbs per gate (or 7 limbs per row).
As stated above, we want the foreign field modulus to fit in as few rows as possible and we need to constrain operands , the quotient and remainder . Each of these will require cells for each limb. Thus, the number of cells required for these is
It is highly advantageous for performance to constrain foreign field multiplication with the minimal number of gates. This not only helps limit the number of rows, but also to keep the gate selector polynomial small. Because of all of this, we aim to constrain foreign field multiplication with a single gate (spanning at most rows). As mentioned above, we have a maximum of 14 permuteable cells per gate, so we can compute the maximum number of limbs that fit within a single gate like this.
Thus, the maximum number of limbs possible in a single gate configuration is 3.
Using and that covers our use cases (see the previous section), we are finally able to derive our limbs size
Therefore, our limb configuration is:
- Limb size bits
- Number of limbs is
Avoiding borrows
When we constrain we want to be as efficient as possible.
Observe that the expansion of into limbs would also have subtractions between limbs, requiring our constraints to account for borrowing. Dealing with this would create undesired complexity and increase the degree of our constraints.
In order to avoid the possibility of subtractions we instead use where
The negated modulus becomes part of our constraint system and is not constrained because it is publicly auditable.
Using the substitution of the negated modulus, we now must constrain .
Observe that since and that when .
Intermediate products
This section explains how we expand our constraints into limbs and then eliminate a number of extra terms.
We must constrain on the limbs, rather than as a whole. As described above, each foreign field element is split into three 88-bit limbs: , where contains the least significant bits and contains the most significant bits and so on.
For clarity, let and , then expanding the right-hand side into limbs we have
Notice that , so the above simplifies to
Recall that and observe that and . Therefore, the terms with or are a multiple of modulus and, thus, congruent to zero . They can be eliminated and we don’t need to compute them. So we are left with 3 intermediate products that we call :
| Term | Scale | Product |
|---|---|---|
So far, we have introduced these checked computations to our constraints
- Computation of
Constraining
Let’s call . Remember that our goal is to constrain that (recall that any more significant bits than the 264th are ignored in ). Decomposing that claim into limbs, that means
We face two challenges
- Since are at least bits each, the right side of the equation above does not fit in
- The subtraction of the remainder’s limbs and could require borrowing
For the moment, let’s not worry about the possibility of borrows and instead focus on the first problem.
Combining multiplications
The first problem is that our native field is too small to constrain . We could split this up by multiplying and separately and create constraints that carefully track borrows and carries between limbs. However, a more efficient approach is combined the whole computation together and accumulate all the carries and borrows in order to reduce their number.
The trick is to assume a space large enough to hold the computation, view the outcome in binary and then split it up into chunks that fit in the native modulus.
To this end, it helps to know how many bits these intermediate products require. On the left side of the equation, is at most bits. We can compute this by substituting the maximum possible binary values (all bits set to 1) into like this
So fits in bits. Similarly, needs at most bits and is at most bits.
| Term | |||
|---|---|---|---|
| Bits |
The diagram below shows the right hand side of the zero-sum equality from equation (2). That is, the value . Let’s look at how the different bits of and impact it.
0 L 2L 3L 4L
|-------------|-------------|-------------|-------------|-------------|
:
|--------------p0-----------:-| 2L + 1
:
|-------------:-p1------------| 2L + 2
p10➚ : p11➚
|----------------p2-------------| 2L + 3
:
|-----r0------| :
:
|-----r1------|
:
|-----r2------|
\__________________________/ \______________________________/
≈ h0 ≈ h1
Within our native field modulus we can fit up to bits, for small values of (but sufficient for our case). Thus, we can only constrain approximately half of at a time. In the diagram above the vertical line at 2L bisects into two bit values: and (the exact definition of these values follow). Our goal is to constrain and .
Computing the zero-sum halves: and
Now we can derive how to compute and from and .
The direct approach would be to bisect both and and then define as just the sum of the lower bits of and minus and . Similarly would be just the sum of upper bits of and minus . However, each bisection requires constraints for the decomposition and range checks for the two halves. Thus, we would like to avoid bisections as they are expensive.
Ideally, if our ’s lined up on the boundary, we would not need to bisect at all. However, we are unlucky and it seems like we must bisect both and . Fortunately, we can at least avoid bisecting by allowing it to be summed into like this
Note that is actually greater than bits in length. This may not only be because it contains whose length is , but also because adding may cause an overflow. The maximum length of is computed by substituting in the maximum possible binary value of for the added terms and for the subtracted terms of the above equation.
which is bits.
N.b. This computation assumes correct sizes values for and , which we assure by range checks on the limbs.
Next, we compute as
The maximum size of is computed as
In order to obtain the maximum value of , we define . Since the maximum value of was , then the maximum value of is . For , the maximum value was , and thus:
which is bits.
| Term | ||
|---|---|---|
| Bits |
Thus far we have the following constraints
- Composition of and result in
- Range check
- Range check
For the next step we would like to constrain and to zero. Unfortunately, we are not able to do this!
-
Firstly, as defined may not be zero because we have not bisected it precisely at bits, but have allowed it to contain the full bits. Recall that these two additional bits are because is at most bits long, but also because adding increases it to . These two additional bits are not part of the first bits of and, thus, are not necessarily zero. That is, they are added from the second bits (i.e. ).
-
Secondly, whilst the highest bits of would wrap to zero , when placed into the smaller bit in the native field, this wrapping does not happen. Thus, ’s highest bits may be nonzero.
We can deal with this non-zero issue by computing carry witness values.
Computing carry witnesses values and
Instead of constraining and to zero, there must be satisfying witness and such that the following constraints hold.
- There exists such that
- There exists such that
Here is the last two bits of ’s bits, i.e., the result of adding the highest bit of and any possible carry bit from the operation of . Similarly, corresponds to the highest bits of . It looks like this
0 L 2L 3L 4L
|-------------|-------------|-------------|-------------|-------------|
:
|--------------h0-----------:--| 2L + 2
: ↖v0
:-------------h1-------------| 2L + 3
: \____________/
: v1➚
Remember we only need to prove the first bits of are zero, since everything is and . It may not be clear how this approach proves the bits are indeed zero because within and there are bits that are nonzero. The key observation is that these bits are too high for .
By making the argument with and we are proving that is something where the least significant bits are all zeros and that is something where the are also zeros. Any nonzero bits after do not matter, since everything is .
All that remains is to range check and
- Range check
- Range check
Subtractions
When unconstrained, computing could require borrowing. Fortunately, we have the constraint that the least significant bits of are 0 (i.e. ), which means borrowing cannot happen.
Borrowing is prevented because when the the least significant bits of the result are 0 it is not possible for the minuend to be less than the subtrahend. We can prove this by contradiction.
Let
- the least significant bits of be
0
Suppose that borrowing occurs, that is, that . Recall that the length of is at most bits. Therefore, since the top two bits of must be zero and so we have
where denotes the least significant bits of .
Recall also that the length of is bits. We know this because limbs of the result are each constrained to be in . So the result of this subtraction is bits. Since the least significant bits of the subtraction are 0 this means that
which is a contradiction with .
Costs
Range checks should be the dominant cost, let’s see how many we have.
Range check (3) requires two range checks for
- a)
- b)
Range check (8) requires two range checks and a decomposition check that is merged in (6).
- a)
- b)
The range checks on and follow from the range checks on and .
So we have 3.a, 3.b, 4, 7, 8.a, 8.b.
| Range check | Gate type(s) | Witness | Rows |
|---|---|---|---|
| 7 | < 1 | ||
| 3.a | < 1 | ||
| 8.b | degree-8 constraint or plookup | 1 | |
| 3.b, 4, 8.a | multi-range-check | 4 |
So we have 1 multi-range-check, 1 single-range-check and 2 low-degree range checks. This consumes just over 5 rows.
Use CRT to constrain
Until now we have constrained the equation , but remember that our application of the CRT means that we must also constrain the equation . We are leveraging the fact that if the identity holds for all moduli in , then it holds for .
Thus, we must check , which is over .
This gives us equality as long as the divisors are coprime. That is, as long as . Since the native modulus is prime, this is true.
Thus, to perform this check is simple. We compute
using our native field modulus with constraints like this
and then constrain
Note that we do not use the negated foreign field modulus here.
This requires a single constraint of the form
with all of the terms expanded into the limbs according the the above equations. The values and do not need to be in the witness.
Range check both sides of
Until now we have constrained that equation holds modulo and modulo , so by the CRT it holds modulo . Remember that, as outlined in the “Overview” section, we must prove our equation over the positive integers, rather than . By doing so, we assure that our solution is not equal to some where , in which case or would be invalid.
First, let’s consider the right hand side . We have
Recall that we have parameterized , so if we can bound and such that
then we have achieved our goal. We know that must be less than , so that is our first check, leaving us with
Therefore, to check , we need to check
This should come at no surprise, since that is how we parameterized earlier on. Note that by checking we assure correctness, while checking assures our solution is unique (sometimes referred to as canonical).
Next, we must perform the same checks for the left hand side (i.e., ). Since and must be less than the foreign field modulus , this means checking
So we have
Since we have
Bound checks
To perform the above range checks we use the upper bound check method described in the Foreign Field Addition RFC.
The upper bound check is as follows. We must constrain over the positive integers, so
Remember is our negated foreign field modulus. Thus, we have
So to check we just need to compute and check
Observe that
and that
So we only need to check
The first check is always true since we are operating over the positive integers and . Observe that the second check also constrains that , since and thus
Therefore, to constrain we only need constraints for
and we don’t require an additional multi-range-check for .
Cost of bound checks
This section analyzes the structure and costs of bounds checks for foreign field addition and foreign field multiplication.
Addition
In our foreign field addition design the operands and do not need to be less than . The field overflow bit for foreign field addition is at most 1. That is, , where is allowed to be greater than . Therefore,
These can be chained along times as desired. The final result
Since the bit length of increases logarithmically with the number of additions, in Kimchi we must only check that the final in the chain is less than to constrain the entire chain.
Security note: In order to defer the check to the end of any chain of additions, it is extremely important to consider the potential impact of wraparound in . That is, we need to consider whether the addition of a large chain of elements greater than the foreign field modulus could wrap around. If this could happen then the check could fail to detect an invalid witness. Below we will show that this is not possible in Kimchi.
Recall that our foreign field elements are comprised of 3 limbs of 88-bits each that are each represented as native field elements in our proof system. In order to wrap around and circumvent the check, the highest limb would need to wrap around. This means that an attacker would need to perform about additions of elements greater than then foreign field modulus. Since Kimchi’s native moduli (Pallas and Vesta) are 255-bits, the attacker would need to provide a witness for about additions. This length of witness is greater than Kimchi’s maximum circuit (resp. witness) length. Thus, it is not possible for the attacker to generate a false proof by causing wraparound with a large chain of additions.
In summary, for foreign field addition in Kimchi it is sufficient to only bound check the last result in a chain of additions (and subtractions)
- Compute bound with addition gate (2 rows)
- Range check (4 rows)
Multiplication
In foreign field multiplication, the situation is unfortunately different, and we must check that each of and are less than . We cannot adopt the strategy from foreign field addition where the operands are allowed to be greater than because the bit length of would increases linearly with the number of multiplications. That is,
and after a chain of multiplication we have
where quickly overflows our CRT modulus . For example, assuming our maximum foreign modulus of and either of Kimchi’s native moduli (i.e. Pallas or Vesta), for . That is, an overflow is possible for a chain of greater than 1 foreign field multiplication. Thus, we must check and are less than for each multiplication.
Fortunately, in many situations the input operands may already be checked either as inputs or as outputs of previous operations, so they may not be required for each multiplication operation.
Thus, the and checks (or equivalently and ) are our main focus because they must be done for every multiplication.
- Compute bound with addition gate (2 rows)
- Compute bound with addition gate (2 rows)
- Range check (4 rows)
- Range check (4 rows)
This costs 12 rows per multiplication. In a subsequent section, we will reduce it to 8 rows.
2-limb decomposition
Due to the limited number of permutable cells per gate, we do not have enough cells for copy constraining and (or and ) to their respective range check gadgets. To address this issue, we must decompose into 2 limbs instead of 3, like so
and
Thus, is decomposed into two limbs (at most bits) and (at most bits).
Note that must be range checked by a multi-range-check gadget. To do this the multi-range-check gadget must
- Store a copy of the limbs and in its witness
- Range check that they are bit each
- Constrain that (this is done with a special mode of the
multi-range-checkgadget) - Have copy constraints for and as outputs of the
ForeignFieldMulgate and inputs to themulti-range-checkgadget
Note that since the foreign field multiplication gate computes from which is already in the witness and and have copy constraints to a multi-range-check gadget that fully constrains their decomposition from , then the ForeignFieldMul gate does not need to store an additional copy of and .
An optimization
Since the and checks must be done for each multiplication it makes sense to integrate them into the foreign field multiplication gate. By doing this we can save 4 rows per multiplication.
Doing this doesn’t require adding a lot more witness data because the operands for the bound computations and are already present in the witness of the multiplication gate. We only need to store the bounds and in permutable witness cells so that they may be copied to multi-range-check gates to check they are each less than .
To constrain , the equation we use is
where is the original value, is the field overflow bit and is the remainder and our desired addition result (e.g. the bound). Rearranging things we get
which is just
Recall from the section “Avoiding borrows” that is often larger than . At first this seems like it could be a problem because in multiplication each operation must be less than . However, this is because the maximum size of the multiplication was quadratic in the size of (we use the CRT, which requires the bound that ). However, for addition the result is much smaller and we do not require the CRT nor the assumption that the operands are smaller than . Thus, we have plenty of space in -bit limbs to perform our addition.
So, the equation we need to constrain is
We can expand the left hand side into the 2 limb format in order to obtain 2 intermediate sums
where and are defined like this
and and are defined like this
Going back to our intermediate sums, the maximum bit length of sum is computed from the maximum bit lengths of and
which means is at most bits long.
Similarly, since and are less than , the max value of is
which means is at most bits long.
Thus, we must constrain
The accumulation of this into parts looks like this.
0 L 2L 3L=t 4L
|-------------|-------------|-------------|-------------|-------------|
:
|------------s01------------:-| 2L + 1
: ↖w01
|------s2-----:-| L + 1
: ↖w2
:
|------------x'01-----------|
:
|------x'2----|
:
\____________________________/
≈ z01 \_____________/
≈ z2
The two parts are computed with
Therefore, there are two carry bits and such that
In this scheme and are witness data, whereas and are formed from a constrained computation of witness data and constraint system public parameter . Note that due to carrying, witness and can be different than the values and computed from the limbs.
Thus, each bound addition requires the following witness data
where is baked into the constraint system. The following constraints are needed
Due to the limited number of copyable witness cells per gate, we are currently only performing this optimization for .
The witness data is
The checks are
Checks (1) - (5) assure that is at most bits. Whereas checks (10) - (11) assure that is at most bits. Altogether they are comprise a single multi-range-check of and . However, as noted above, we do not have enough copyable cells to output and to the multi-range-check gadget. Therefore, we adopt a strategy where the 2 limbs and are output to the multi-range-check gadget where the decomposition of and into is constrained and then and are range checked.
Although and are not range checked directly, this is safe because, as shown in the “Bound checks” section, range-checking that also constrains that . Therefore, the updated checks are
-
multi-range-check -
multi-range-check -
ForeignFieldMul -
ForeignFieldMul -
ForeignFieldMul -
multi-range-check -
ForeignFieldMul -
ForeignFieldMul -
multi-range-check -
ForeignFieldMul -
ForeignFieldMul -
ForeignFieldMul
Note that we don’t need to range-check is at most bits because it is already implicitly constrained by the multi-range-check gadget constraining that and are each at most bits and that . Furthermore, since constraining the decomposition is already part of the multi-range-check gadget, we do not need to do it here also.
To simplify things further, we can combine some of these checks. Recall our checked computations for the intermediate sums
where and . These do not need to be separate constraints, but are instead part of existing ones.
Checks (10) and (11) can be combined into a single constraint . Similarly, checks (3) - (5) and (8) can be combined into with and further expanded. The final minimal number of constraints are
-
multi-range-check -
multi-range-check -
multi-range-check -
ForeignFieldMul -
ForeignFieldMul -
multi-range-check -
ForeignFieldMul -
ForeignFieldMul
Since we already needed to range-check or , the total number of new constraints added is 5: 4 added to to ForeignFieldMul and 1 added to multi-range-check gadget for constraining the decomposition of .
This saves 2 rows per multiplication.
Chaining multiplications
Due to the limited number of cells accessible to gates, we are not able to chain multiplications into multiplications. We can chain foreign field additions into foreign field multiplications, but currently do not support chaining multiplications into additions (though there is a way to do it).
Constraining the computation
Now we collect all of the checks that are required to constrain foreign field multiplication
1. Range constrain
If the operand has not been constrained to by any previous foreign field operations, then we constrain it like this
- Compute bound with addition gate (2 rows)
ForeignFieldAdd - Range check (4 rows)
multi-range-check
2. Range constrain
If the operand has not been constrained to by any previous foreign field operations, then we constrain it like this
- Compute bound with addition gate (2 rows)
ForeignFieldAdd - Range check (4 rows)
multi-range-check
3. Range constrain
The quotient is constrained to for each multiplication as part of the multiplication gate
- Compute bound with
ForeignFieldMulconstraints - Range check (4 rows)
multi-range-check - Range check
ForeignFieldMul(implicit by storingqin witness)
4. Range constrain
The remainder is constrained to for each multiplication using an external addition gate.
- Compute bound with addition gate (2 rows)
ForeignFieldAdd - Range check (4 rows)
multi-range-check
5. Compute intermediate products
Compute and constrain the intermediate products and as:
-
ForeignFieldMul -
ForeignFieldMul -
ForeignFieldMul
where each of them is about -length elements.
6. Native modulus checked computations
Compute and constrain the native modulus values, which are used to check the constraints modulo in order to apply the CRT
7. Decompose middle intermediate product
Check that :
-
ForeignFieldMul - Range check
multi-range-check - Range check
-
ForeignFieldMul - Range check
multi-range-check - Range check with a degree-4 constraint
ForeignFieldMul
-
8. Zero sum for multiplication
Now we have to constrain the zero sum
We constrain the first and the second halves as
-
ForeignFieldMul -
ForeignFieldMul
And some more range checks
- Check that with a degree-4 constraint
ForeignFieldMul - Check that
- Check
ForeignFieldMul - Check
ForeignFieldMul - Check with range constraint
multi-range-check
- Check
To check that (i.e. that is at most 3 bits long) we first range-check with a 12-bit plookup. This means there can be no higher bits set beyond the 12-bits of . Next, we scale by in order to move the highest bits beyond the th bit. Finally, we perform a 12-bit plookup on the resulting value. That is, we have
- Check with a 12-bit plookup (to prevent any overflow)
- Check
- Check is a 12-bit value with a 12-bit plookup
9. Native modulus constraint
Using the checked native modulus computations we constrain that
10. Compute intermediate sums
Compute and constrain the intermediate sums and as:
11. Decompose the lower quotient bound
Check that .
Done by (3) above with the multi-range-check on
- Range check
- Range check
12. Zero sum for quotient bound addition
We constrain that
by constraining the two halves
We also need a couple of small range checks
- Check that is boolean
ForeignFieldMul - Check that is boolean
ForeignFieldMul
Layout
Based on the constraints above, we need the following 12 values copied from the range check gates.
a0, a1, a2, b0, b1, b2, q0, q1, q2, r0, r1, r2
Since we need 12 copied values for the constraints, they must span 2 rows.
The bound limbs and must be in copyable cells so they can be range-checked. Similarly, the limbs of the operands , and the result must all be in copyable cells. This leaves only 2 remaining copyable cells and, therefore, we cannot compute and output . It must be deferred to an external ForeignFieldAdd gate with the cells copied as an argument.
NB: the and values are publicly accessible in the constraint system.
| Curr | Next | |
|---|---|---|
| Column | ForeignFieldMul | Zero |
| 0 | (copy) | (copy) |
| 1 | (copy) | (copy) |
| 2 | (copy) | (copy) |
| 3 | (copy) | (copy) |
| 4 | (copy) | (copy) |
| 5 | (copy) | (copy) |
| 6 | (copy) | (copy) |
| 7 | (plookup) | |
| 8 | (plookup) | |
| 9 | ||
| 10 | ||
| 11 | ||
| 12 | ||
| 13 | ||
| 14 |
where .
Checked computations
As described above foreign field multiplication has the following intermediate computations
- .
For the bound addition we must also compute
Note the equivalence
where and can be done with checked computations.
Next, for applying the CRT we compute
Checks
In total we require the following checks
-
multi-range-check -
multi-range-check -
multi-range-check -
multi-range-check -
multi-range-check -
multi-range-check -
multi-range-check
Constraints
These checks can be condensed into the minimal number of constraints as follows.
First, we have the range-check corresponding to (1) as a degree-4 constraint
C1:
Checks (2) - (4) are all handled by a single multi-range-check gadget.
C2: multi-range-check
Next (5) and (6) can be combined into
C3:
Now we have the range-check corresponding to (7) as another degree-4 constraint
C4:
Next we have check (8)
C5:
Up next, checks (9) and (10) are 12-bit range checks
C6: Plookup
C7: Plookup
Check (11) is part of bounding to 3-bits in length
C8:
Now checks (12) and (13) can be combined into
C9:
Next, for our use of the CRT, we must constrain that . Thus, check (14) is
C10:
Next we must constrain the quotient bound addition.
Checks (15) - (18) are all combined into multi-range-check gadget
C11: multi-range-check and .
Check (19) is a carry bit boolean check
C12:
Next, check (20) is
C13:
Check (21) is another boolean check
C14:
Finally, check (22) is
C15:
The Zero gate has no constraints and is just used to hold values required by the ForeignFieldMul gate.
External checks
The following checks must be done with other gates to assure the soundness of the foreign field multiplication
- Range check input
multi-range-checkmulti-range-check
- Range check witness data
multi-range-checkand checkmulti-range-check-bit limbs:
- Range check output
multi-range-checkeither or
- Compute and constrain the bound on
ForeignFieldAdd
Copy constraints must connect the above witness cells to their respective input cells within the corresponding external check gates witnesses.
RFC: Keccak
The Keccak gadget is comprised of 3 circuit gates (Xor16, Rot64, and Zero)
Keccak works with 64-bit words. The state is represented using matrix of 64 bit words. Each compression step of Keccak consists of 24 rounds. Let us denote the state matrix with A (indexing elements as A[x,y]), from which we derive further states as follows in each round. Each round then consists of the following 5 steps:
FOR and is the rotation offset defined for Keccak. The values are in the table below extracted from the Keccak reference https://keccak.team/files/Keccak-reference-3.0.pdf
| x = 3 | x = 4 | x = 0 | x = 1 | x = 2 | |
|---|---|---|---|---|---|
| y = 2 | 155 | 231 | 3 | 10 | 171 |
| y = 1 | 55 | 276 | 36 | 300 | 6 |
| y = 0 | 28 | 91 | 0 | 1 | 190 |
| y = 4 | 120 | 78 | 210 | 66 | 253 |
| y = 3 | 21 | 136 | 105 | 45 | 15 |
Design Approach:
The atomic operations are XOR, ROT, NOT, AND. In the sections below, we will describe the gates for these operations. Below are some common approaches followed in their design.
To fit within 15 wires, we first decompose each word into its lower and upper 32-bit components. A gate for an atomic operation works with those 32-bit components at a time.
Before we describe the specific gate design approaches, below are some constraints in the Kimchi framework that dictated those approaches.
- only 4 lookups per row
- only first 7 columns are available to the permutation polynomial
Poseidon hash function
Poseidon is a hash function that can efficiently run in a zk circuit. (See poseidon-hash.info) It is based on the sponge function, with a state composed of field elements and a permutation based on field element operation (addition and exponentiation).
The permutation contains an S-box (exponentiation of a group element), adding constants to the state, and matrix multiplication of the state (multiplications and additions) with an MDS matrix.
Since a field element is around 255-bit, a single field element is enough as the capacity of the sponge to provide around 128-bit security.
We might want to piggy back on the zcash poseidon spec at some point (perhaps by making this an extension of the zcash poseidon spec).
APIs
We define a base sponge, and a scalar sponge. Both must be instantiated when verifying a proof (this is due to recursion-support baked in Kimchi).
External users of kimchi (or pickles) are most likely to interact with a wrap proof (see the pickles specification). As such, the sponges they need to instantiate are most likely to be instantiated with:
- Poseidon-Fp for base sponge
- Poseidon-Fq for the scalar sponge
Base Sponge
new(params) -> BaseSponge. Creates a new base sponge.BaseSponge.absorb(field_element). Absorbs a field element by calling the underlying spongeabsorbfunction.BaseSponge.absorb_point(point). Absorbs an elliptic curve point. If the point is the point at infinity, absorb two zeros. Otherwise, absorb the x and y coordinates with two calls toabsorb.BaseSponge.absorb_scalar(field_element_of_scalar_field). Absorbs a scalar.- If the scalar field is smaller than the base field (e.g. Fp is smaller than Fq), then the scalar is casted to a field element of the base field and absorbed via
absorb. - Otherwise, the value is split between its least significant bit and the rest. Then both values are casted to field elements of the base field and absorbed via
absorb(the high bits first, then the low bit).
- If the scalar field is smaller than the base field (e.g. Fp is smaller than Fq), then the scalar is casted to a field element of the base field and absorbed via
BaseSponge.digest() -> field_element. Thesqueezefunction of the underlying sponge function is called and the first field element is returned.BaseSponge.digest_scalar() -> field_element_of_scalar_field.BaseSponge.challenge // TODO: specify.BaseSponge.challenge_fq // TODO: specify.
Scalar Sponge
- new(params) -> ScalarSponge
- ScalarSponge.absorb(scalar_field_element)
- ScalarSponge.digest() -> scalar_field_element
- ScalarSponge.challenge // TODO: specify
Algorithms
Note that every operation is done in the field of the sponge.
In this section we define the high-level algorithm behind the permutation and the sponge. The permutation is never used directly by users, it is used only by the sponge function.
Permutation
In pseudo-code:
def sbox(field_element):
# modular exponentiation
return field_element^7
# apply MDS matrix
def apply_mds(state):
n = [0, 0, 0]
n[0] = state[0] * mds[0][0] + state[1] * mds[0][1] + state[2] * mds[0][2]
n[1] = state[0] * mds[1][0] + state[1] * mds[1][1] + state[2] * mds[1][2]
n[2] = state[0] * mds[2][0] + state[1] * mds[2][1] + state[2] * mds[2][2]
return n
# a round in the permutation
def apply_round(round, state):
# sbox
state[0] = sbox(state[0])
state[1] = sbox(state[1])
state[2] = sbox(state[2])
# apply MDS matrix
state = apply_mds(state)
# add round constant
state[0] += round_constants[round][0]
state[1] += round_constants[round][1]
state[2] += round_constants[round][2]
# the permutation
def permutation(state):
round_offset = 0
if ARK_INITIAL:
constant = round_constants[0]
state[0] += constant[0]
state[1] += constant[1]
state[2] += constant[2]
round_offset = 1
for round in range(round_offset, ROUNDS + round_offset):
apply_round(round, state)
Sponge
In pseudo-code:
def new():
return {
"state": [0] * RATE, # `RATE` field elements set to 0
"mode": "absorbing",
"offset": 0,
}
def absorb(sponge, field_element):
# if we're changing mode, reset the offset
if sponge.mode == "squeezing":
sponge.mode = "absorbing"
sponge.offset = 0
# we reached the end of the rate, permute
elif sponge.offset == RATE:
sponge.state = permutation(sponge.state)
sponge.offset = 0
# absorb by adding to the state
sponge.state[sponge.offset] += field_element
sponge.offset += 1
def squeeze(sponge):
# permute when changing mode or when we already squeezed everything
if sponge.mode == "absorbing" or sponge.offset == RATE:
sponge.mode = "squeezing"
sponge.state = permutation(sponge.state)
sponge.offset = 0
result = sponge.state[sponge.offset]
sponge.offset += 1
return result
Instantiations
We instantiate two versions of Poseidon, one for the field Fp, and one for the field Fq (see the pasta specification).
Both share the following sponge configuration:
- capacity 1
- rate: 2
and the following permutation configuration:
- number of full rounds: 55
- sbox: 7
- ARK_INITIAL: false
Poseidon-Fp
You can find the MDS matrix and round constants we use in /poseidon/src/pasta/fp_kimchi.rs.
Poseidon-Fq
You can find the MDS matrix and round constants we use in /poseidon/src/pasta/fp_kimchi.rs.
Test vectors
We have test vectors contained in /poseidon/src/tests/test_vectors/kimchi.json.
Pointers to the OCaml code
- our ocaml implementation: https://github.com/minaprotocol/mina/blob/develop/src/lib/random_oracle/random_oracle.mli
- relies on random_oracle_input: https://github.com/minaprotocol/mina/blob/develop/src/lib/random_oracle_input/random_oracle_input.ml
- is instantiated with two types of fields:
- https://github.com/minaprotocol/mina/blob/develop/src/nonconsensus/snark_params/snark_params_nonconsensus.ml
Kimchi Polynomial Commitment
Our polynomial commitment scheme is a bootleproof-type of commitment scheme, which is a mix of:
- The polynomial commitment scheme described in appendix A.1 of the PCD paper.
- The zero-knowledge opening described in section 3.1 of the HALO paper.
Pasta Curves
The two curves pallas and vesta (pa(llas ve)sta) created by the Zcash team. Each curve’s scalar field is the other curve’s base field, which is practical for recursion (see Pickles).
- supporting evidence
- mina parameters: https://github.com/o1-labs/proof-systems/tree/master/curves/src/pasta
- arkworks ark-pallas, pallas and vesta
Note that in general Fq refers to the base field (in which the curve is defined over), while Fr refers to the scalar field (defined by the order of the curve). But in our code, because of the cycles:
- Fp refers to the base field of Pallas, and the scalar field of Vesta
- Fq refers to the base field of Vesta, and the scalar field of Pallas
Note that .
In pickles:
- Vesta is also referred to as the step curve, or the tick curve.
- Pallas is also referred to as the wrap curve, or the tock curve.
Pallas
- curve equation:
- base field:
- scalar field:
- mina generator:
- arkworks generator:
- endo:
endo_q = 2D33357CB532458ED3552A23A8554E5005270D29D19FC7D27B7FD22F0201B547
endo_r = 397E65A7D7C1AD71AEE24B27E308F0A61259527EC1D4752E619D1840AF55F1B1
You can use sage to test this:
Fp = GF(28948022309329048855892746252171976963363056481941560715954676764349967630337)
Pallas = EllipticCurve(Fp, [0, 5])
Pallas.count_points()
Vesta
-
curve equation:
-
base field:
-
scalar field:
-
mina generator:
-
arkworks generator:
-
endo:
endo_q = 06819A58283E528E511DB4D81CF70F5A0FED467D47C033AF2AA9D2E050AA0E4F
endo_r = 12CCCA834ACDBA712CAAD5DC57AAB1B01D1F8BD237AD31491DAD5EBDFDFE4AB9
You can use sage to test this:
Fq = GF(28948022309329048855892746252171976963363056481941647379679742748393362948097)
Vesta = EllipticCurve(Fq, [0, 5])
Vesta.count_points()
Kimchi
- This document specifies kimchi, a zero-knowledge proof system that’s a variant of PLONK.
- This document does not specify how circuits are created or executed, but only how to convert a circuit and its execution into a proof.
Table of content:
Overview
There are three main algorithms to kimchi:
- Setup: takes a circuit and produces a prover index, and a verifier index.
- Proof creation: takes the prover index, and the execution trace of the circuit to produce a proof.
- Proof verification: takes the verifier index and a proof to verify.
As part of these algorithms, a number of tables are created (and then converted into polynomials) to create a proof.
Tables used to describe a circuit
The following tables are created to describe the circuit:
Gates. A circuit is described by a series of gates, that we list in a table. The columns of the tables list the gates, while the rows are the length of the circuit. For each row, only a single gate can take a value while all other gates take the value .
| row | Generic | Poseidon | CompleteAdd | VarBaseMul | EndoMul | EndoMulScalar | ChaCha0 | ChaCha1 | ChaCha2 | ChaChaFinal |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Coefficients. The coefficient table has 15 columns, and is used to tweak the gates. Currently, only the Generic and the Poseidon gates use it (refer to their own sections to see how). All other gates set their values to .
| row | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / |
Wiring (or Permutation, or sigmas). For gates to take the outputs of other gates as inputs, we use a wiring table to wire registers together.
To learn about registers, see the next section.
It is defined at every row, but only for the first registers.
Each cell specifies a (row, column) tuple that it should be wired to. Cells that are not connected to another cell are wired to themselves.
Note that if three or more registers are wired together, they must form a cycle.
For example, if register (0, 4) is wired to both registers (80, 6) and (90, 0) then you would have the following table:
| row | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|---|
| 0 | 0,0 | 0,1 | 0,2 | 0,3 | 80,6 | 0,5 | 0,6 |
| … | |||||||
| 80 | 80,0 | 80,1 | 80,2 | 80,3 | 80,4 | 80,5 | 90,0 |
| … | |||||||
| 90 | 0,4 | 90,1 | 90,2 | 90,3 | 90,4 | 90,5 | 90,6 |
The lookup feature is currently optional, as it can add some overhead to the protocol. In the case where you would want to use lookups, the following tables would be needed:
Lookup Tables. The different lookup tables that are used in the circuit. For example, the XOR lookup table:
| l | r | o |
|---|---|---|
| 1 | 0 | 1 |
| 0 | 1 | 1 |
| 1 | 1 | 0 |
| 0 | 0 | 0 |
Lookup selectors. A lookup selector is used to perform a number of queries in different lookup tables. Any gate can advertise its use of a lookup selector (so a lookup selector can be associated to several gates), and on which rows they want to use them (current and/or next). In cases where a gate need to use lookups in its current row only, and is the only one performing a specific combination of queries, then its gate selector can be used in place of a lookup selector. As with gates, lookup selectors (including gates used as lookup selectors) are mutually exclusives (only one can be used on a given row).
We currently have two lookup selectors:
| row | ChaChaQuery | ChaChaFinalQuery |
|---|---|---|
| 0 | 0 | 0 |
| 1 | 1 | 0 |
Where each apply 4 queries. A query is a table describing which lookup table it queries, and the linear combination of the witness to use in the query. For example, the following table describes a query into the XOR table made out of linear combinations of registers (checking that ):
| table_id | l | r | o |
|---|---|---|---|
| XOR | 1, r0 | 1, r2 | 2, r1 |
Tables produced during proof creation
The following tables are created by the prover at runtime:
Registers (or Witness). Registers are also defined at every row, and are split into two types: the IO registers from to usually contain input or output of the gates (note that a gate can output a value on the next row as well).
I/O registers can be wired to each other (they’ll be forced to have the same value), no matter what row they’re on (for example, the register at row:0, col:4 can be wired to the register at row:80, col:6).
The rest of the registers, through , are called advice registers as they can store values that useful only for the row’s active gate.
Think of them as intermediary or temporary values needed in the computation when the prover executes a circuit.
| row | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | / | / | / | / | / | / | / | / | / | / | / | / | / | / | / |
Wiring (Permutation) trace. You can think of the permutation trace as an extra register that is used to enforce the wiring specified in the wiring table. It is a single column that applies on all the rows as well, which the prover computes as part of a proof.
| row | pt |
|---|---|
| 0 | / |
Queries trace. These are the actual values made by queries, calculated by the prover at runtime, and used to construct the proof.
Table trace. Represents the concatenation of all the lookup tables, combined into a single column at runtime by both the prover and the verifier.
Sorted trace. Represents the processed (see the lookup section) concatenation of the queries trace and the table trace. It is produced at runtime by the prover. The sorted trace is long enough that it is split in several columns.
Lookup (aggregation, or permutation) trace. This is a one column table that is similar to the wiring (permutation) trace we talked above. It is produced at runtime by the prover.
Dependencies
To specify kimchi, we rely on a number of primitives that are specified outside of this specification. In this section we list these specifications, as well as the interfaces we make use of in this specification.
Polynomial Commitments
Refer to the specification on polynomial commitments. We make use of the following functions from that specification:
PolyCom.non_hiding_commit(poly) -> PolyCom::NonHidingCommitmentPolyCom.commit(poly) -> PolyCom::HidingCommitmentPolyCom.evaluation_proof(poly, commitment, point) -> EvaluationProofPolyCom.verify(commitment, point, evaluation, evaluation_proof) -> bool
Poseidon hash function
Refer to the specification on Poseidon. We make use of the following functions from that specification:
Poseidon.init(params) -> FqSpongePoseidon.update(field_elem)Poseidon.finalize() -> FieldElem
specify the following functions on top:
Poseidon.produce_challenge()(TODO: uses the endomorphism)Poseidon.to_fr_sponge() -> state_of_fq_sponge_before_eval, FrSponge
With the current parameters:
- S-Box alpha:
7 - Width:
3 - Rate:
2 - Full rounds:
55 - Round constants:
fp_kimchi,fq_kimchi - MDS matrix:
fp_kimchi,fq_kimchi
Pasta
Kimchi is made to work on cycles of curves, so the protocol switch between two fields Fq and Fr, where Fq represents the base field and Fr represents the scalar field.
See the Pasta curves specification.
Constraints
Kimchi enforces the correct execution of a circuit by creating a number of constraints and combining them together. In this section, we describe all the constraints that make up the main polynomial once combined.
We define the following functions:
combine_constraints(range_alpha, constraints), which takes a range of contiguous powers of alpha and a number of constraints. It returns the sum of all the constraints, where each constraint has been multiplied by a power of alpha. In other words it returns:
The different ranges of alpha are described as follows:
- gates. Offset starts at 0 and 21 powers of are used
- Permutation. Offset starts at 21 and 3 powers of are used
As gates are mutually exclusive (a single gate is used on each row), we can reuse the same range of powers of alpha across all the gates.
TODO: linearization
Permutation
The permutation constraints are the following 4 constraints:
The two sides of the coin (with ):
and
the initialization of the accumulator:
and the accumulator’s final value:
You can read more about why it looks like that in this post.
The quotient contribution of the permutation is split into two parts and . They will be used by the prover.
and bnd:
The linearization:
where is computed as:
To compute the permutation aggregation polynomial, the prover interpolates the polynomial that has the following evaluations. The first evaluation represents the initial value of the accumulator: For , where is the size of the domain, evaluations are computed as:
with
and
If computed correctly, we should have .
Finally, randomize the last EVAL_POINTS evaluations and ,
in order to add zero-knowledge to the protocol.
Lookup
Lookups in kimchi allows you to check if a single value, or a series of values, are part of a table. The first case is useful to check for checking if a value belongs to a range (from 0 to 1,000, for example), whereas the second case is useful to check truth tables (for example, checking that three values can be found in the rows of an XOR table) or write and read from a memory vector (where one column is an index, and the other is the value stored at that index).
Similarly to the generic gate, each values taking part in a lookup can be scaled with a fixed field element.
The lookup functionality is an opt-in feature of kimchi that can be used by custom gates. From the user’s perspective, not using any gates that make use of lookups means that the feature will be disabled and there will be no overhead to the protocol.
Refer to the lookup RFC for an overview of the lookup feature.
In this section, we describe the tables kimchi supports, as well as the different lookup selectors (and their associated queries)
The Lookup Tables
Kimchi currently supports a single lookup table:
/// The table ID associated with the XOR lookup table.
pub const XOR_TABLE_ID: i32 = 0;
/// The range check table ID.
pub const RANGE_CHECK_TABLE_ID: i32 = 1;
XOR. The lookup table for 4-bit xor.
Note that it is constructed so that (0, 0, 0) is the last position in the table.
This is because tables are extended to the full size of a column (essentially)
by padding them with their final value. And, having the value (0, 0, 0) here means
that when we commit to this table and use the dummy value in the lookup_sorted
columns, those entries that have the dummy value of
will translate into a scalar multiplication by 0, which is free.
The Lookup Selectors
ChaChaSelector. Performs 4 queries to the XOR lookup table.
| l | r | o | - | l | r | o | - | l | r | o | - | l | r | o |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1, r3 | 1, r7 | 1, r11 | - | 1, r4 | 1, r8 | 1, r12 | - | 1, r5 | 1, r9 | 1, r13 | - | 1, r6 | 1, r10 | 1, r14 |
ChaChaFinalSelector. Performs 4 different queries to the XOR lookup table. (TODO: specify the layout)
Producing the sorted table as the prover
Because of our ZK-rows, we can’t do the trick in the plookup paper of wrapping around to enforce consistency between the sorted lookup columns.
Instead, we arrange the LookupSorted table into columns in a snake-shape.
Like so,
_ _
| | | | |
| | | | |
|_| |_| |
or, imagining the full sorted array is [ s0, ..., s8 ], like
s0 s4 s4 s8
s1 s3 s5 s7
s2 s2 s6 s6
So the direction (“increasing” or “decreasing” (relative to LookupTable) is
if i % 2 = 0 { Increasing } else { Decreasing }
Then, for each i < max_lookups_per_row, if i % 2 = 0, we enforce that the
last element of LookupSorted(i) = last element of LookupSorted(i + 1),
and if i % 2 = 1, we enforce that
the first element of LookupSorted(i) = first element of LookupSorted(i + 1).
Gates
A circuit is described as a series of gates. In this section we describe the different gates currently supported by kimchi, the constraints associated to them, and the way the register table, coefficient table, and permutation can be used in conjunction.
TODO: for each gate describe how to create it?
Double Generic Gate
The double generic gate contains two generic gates.
A generic gate is simply the 2-fan in gate specified in the vanilla PLONK protocol that allows us to do operations like:
- addition of two registers (into an output register)
- or multiplication of two registers
- equality of a register with a constant
More generally, the generic gate controls the coefficients in the equation:
The layout of the gate is the following:
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| l1 | r1 | o1 | l2 | r2 | o2 |
where l1, r1, and o1 (resp. l2, r2, o2) are the left, right, and output registers of the first (resp. second) generic gate.
The selectors are stored in the coefficient table as:
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| l1 | r1 | o1 | m1 | c1 | l2 | r2 | o2 | m2 | c2 |
with m1 (resp. m2) the mul selector for the first (resp. second) gate, and c1 (resp. c2) the constant selector for the first (resp. second) gate.
The constraints:
where the are the coefficients.
Poseidon
The poseidon gate encodes 5 rounds of the poseidon permutation.
A state is represents by 3 field elements. For example,
the first state is represented by (s0, s0, s0),
and the next state, after permutation, is represented by (s1, s1, s1).
Below is how we store each state in the register table:
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| s0 | s0 | s0 | s4 | s4 | s4 | s1 | s1 | s1 | s2 | s2 | s2 | s3 | s3 | s3 |
| s5 | s5 | s5 |
The last state is stored on the next row. This last state is either used:
- with another Poseidon gate on that next row, representing the next 5 rounds.
- or with a Zero gate, and a permutation to use the output elsewhere in the circuit.
- or with another gate expecting an input of 3 field elements in its first registers.
As some of the poseidon hash variants might not use rounds (for some ), the result of the 4-th round is stored directly after the initial state. This makes that state accessible to the permutation.
We define as the MDS matrix at row and column .
We define the S-box operation as for the SPONGE_BOX constant.
We store the 15 round constants required for the 5 rounds (3 per round) in the coefficient table:
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7 | r8 | r9 | r10 | r11 | r12 | r13 | r14 |
The initial state, stored in the first three registers, are not constrained. The following 4 states (of 3 field elements), including 1 in the next row, are constrained to represent the 5 rounds of permutation. Each of the associated 15 registers is associated to a constraint, calculated as:
first round:
second round:
third round:
fourth round:
fifth round:
where is the polynomial which points to the next row.
Chacha
There are four chacha constraint types, corresponding to the four lines in each quarter round.
a += b; d ^= a; d <<<= 16;
c += d; b ^= c; b <<<= 12;
a += b; d ^= a; d <<<= 8;
c += d; b ^= c; b <<<= 7;
or, written without mutation, (and where + is mod ),
a' = a + b ; d' = (d ⊕ a') <<< 16;
c' = c + d'; b' = (b ⊕ c') <<< 12;
a'' = a' + b'; d'' = (d' ⊕ a') <<< 8;
c'' = c' + d''; b'' = (c'' ⊕ b') <<< 7;
We lay each line as two rows.
Each line has the form
x += z; y ^= x; y <<<= k
or without mutation,
x' = x + z; y' = (y ⊕ x') <<< k
which we abbreviate as
L(x, x’, y, y’, z, k)
In general, such a line will be laid out as the two rows
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| x | y | z | (y^x’)_0 | (y^x’)_1 | (y^x’)_2 | (y^x’)_3 | (x+z)_0 | (x+z)_1 | (x+z)_2 | (x+z)_3 | y_0 | y_1 | y_2 | y_3 |
| x’ | y’ | (x+z)_8 | (y^x’)_4 | (y^x’)_5 | (y^x’)_6 | (y^x’)_7 | (x+z)_4 | (x+z)_5 | (x+z)_6 | (x+z)_7 | y_4 | y_5 | y_6 | y_7 |
where A_i indicates the i^th nybble (four-bit chunk) of the value A.
is special, since we know it is actually at most 1 bit (representing the overflow bit of x + z).
So the first line L(a, a', d, d', b, 8) for example becomes the two rows
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| a | d | b | (d^a’)_0 | (d^a’)_1 | (d^a’)_2 | (d^a’)_3 | (a+b)_0 | (a+b)_1 | (a+b)_2 | (a+b)_3 | d_0 | d_1 | d_2 | d_3 |
| a’ | d’ | (a+b)_8 | (d^a’)_4 | (d^a’)_5 | (d^a’)_6 | (d^a’)_7 | (a+b)_4 | (a+b)_5 | (a+b)_6 | (a+b)_7 | d_4 | d_5 | d_6 | d_7 |
along with the equations
- (booleanity check)
The rotates the nybbles left by 4, which means bit-rotating by as desired.
The final line is a bit more complicated as we have to rotate by 7, which is not a multiple of 4. We accomplish this as follows.
Let’s say we want to rotate the nybbles left by 7. First we’ll rotate left by 4 to get
Rename these as
We now want to left-rotate each by 3.
Let be the low bit of . Then, the low 3 bits of are .
The result will thus be
or re-writing in terms of our original nybbles ,
For neatness, letting , the first 2 rows for the final line will be:
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| x | y | z | (y^x’)_0 | (y^x’)_1 | (y^x’)_2 | (y^x’)_3 | (x+z)_0 | (x+z)_1 | (x+z)_2 | (x+z)_3 | y_0 | y_1 | y_2 | y_3 |
| x’ | _ | (x+z)_8 | (y^x’)_4 | (y^x’)_5 | (y^x’)_6 | (y^x’)_7 | (x+z)_4 | (x+z)_5 | (x+z)_6 | (x+z)_7 | y_4 | y_5 | y_6 | y_7 |
but then we also need to perform the bit-rotate by 1.
For this we’ll add an additional 2 rows. It’s probably possible to do it with just 1, but I think we’d have to change our plookup setup somehow, or maybe expand the number of columns, or allow access to the previous row.
Let be the low bit of the nybble n. The 2 rows will be
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| y’ | (y^x’)_0 | (y^x’)_1 | (y^x’)_2 | (y^x’)_3 | lo((y^x’)_0) | lo((y^x’)_1) | lo((y^x’)_2) | lo((y^x’)_3) | ||||||
| _ | (y^x’)_4 | (y^x’)_5 | (y^x’)_6 | (y^x’)_7 | lo((y^x’)_4) | lo((y^x’)_5) | lo((y^x’)_6) | lo((y^x’)_7) |
On each of them we’ll do the plookups
((cols[1] - cols[5])/2, (cols[1] - cols[5])/2, 0) in XOR
((cols[2] - cols[6])/2, (cols[2] - cols[6])/2, 0) in XOR
((cols[3] - cols[7])/2, (cols[3] - cols[7])/2, 0) in XOR
((cols[4] - cols[8])/2, (cols[4] - cols[8])/2, 0) in XOR
which checks that is a nybble, which guarantees that the low bit is computed correctly.
There is no need to check nybbleness of (y^x’)_i because those will be constrained to be equal to the copies of those values from previous rows, which have already been constrained for nybbleness (by the lookup in the XOR table).
And we’ll check that y’ is the sum of the shifted nybbles.
Elliptic Curve Addition
The layout is
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|---|---|
| x1 | y1 | x2 | y2 | x3 | y3 | inf | same_x | s | inf_z | x21_inv |
where
(x1, y1), (x2, y2)are the inputs and(x3, y3)the output.infis a boolean that is true iff the result (x3, y3) is the point at infinity.
The rest of the values are inaccessible from the permutation argument, but
same_xis a boolean that is true iffx1 == x2.
The following constraints are generated:
constraint 1:
constraint 2:
constraint 3:
constraint 4:
constraint 5:
constraint 6:
constraint 7:
Endo Scalar
We give constraints for the endomul scalar computation.
Each row corresponds to 8 iterations of the inner loop in “Algorithm 2” on page 29 of the Halo paper.
The state of the algorithm that’s updated across iterations of the loop is (a, b).
It’s clear from that description of the algorithm that an iteration of the loop can
be written as
(a, b, i) ->
( 2 * a + c_func(r_{2 * i}, r_{2 * i + 1}),
2 * b + d_func(r_{2 * i}, r_{2 * i + 1}) )
for some functions c_func and d_func. If one works out what these functions are on
every input (thinking of a two-bit input as a number in ), one finds they
are given by
c_func(x), defined by
c_func(0) = 0c_func(1) = 0c_func(2) = -1c_func(3) = 1
d_func(x), defined by
d_func(0) = -1d_func(1) = 1d_func(2) = 0d_func(3) = 0
One can then interpolate to find polynomials that implement these functions on .
You can use sage, as
R = PolynomialRing(QQ, 'x')
c_func = R.lagrange_polynomial([(0, 0), (1, 0), (2, -1), (3, 1)])
d_func = R.lagrange_polynomial([(0, -1), (1, 1), (2, 0), (3, 0)])
Then, c_func is given by
2/3 * x^3 - 5/2 * x^2 + 11/6 * x
and d_func is given by
2/3 * x^3 - 7/2 * x^2 + 29/6 * x - 1 <=> c_func + (-x^2 + 3x - 1)
We lay it out the witness as
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | Type |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| n0 | n8 | a0 | b0 | a8 | b8 | x0 | x1 | x2 | x3 | x4 | x5 | x6 | x7 | ENDO |
where each xi is a two-bit “crumb”.
We also use a polynomial to check that each xi is indeed in ,
which can be done by checking that each is a root of the polyunomial below:
crumb(x)
= x (x - 1) (x - 2) (x - 3)
= x^4 - 6*x^3 + 11*x^2 - 6*x
= x *(x^3 - 6*x^2 + 11*x - 6)
Each iteration performs the following computations
- Update :
- Update :
- Update :
Then, after the 8 iterations, we compute expected values of the above operations as:
expected_n8 := 2 * ( 2 * ( 2 * ( 2 * ( 2 * ( 2 * ( 2 * (2 * n0 + x0) + x1 ) + x2 ) + x3 ) + x4 ) + x5 ) + x6 ) + x7expected_a8 := 2 * ( 2 * ( 2 * ( 2 * ( 2 * ( 2 * ( 2 * (2 * a0 + c0) + c1 ) + c2 ) + c3 ) + c4 ) + c5 ) + c6 ) + c7expected_b8 := 2 * ( 2 * ( 2 * ( 2 * ( 2 * ( 2 * ( 2 * (2 * b0 + d0) + d1 ) + d2 ) + d3 ) + d4 ) + d5 ) + d6 ) + d7
Putting together all of the above, these are the 11 constraints for this gate
- Checking values after the 8 iterations:
- Constrain :
0 = expected_n8 - n8 - Constrain :
0 = expected_a8 - a8 - Constrain :
0 = expected_b8 - b8
- Constrain :
- Checking the crumbs, meaning each is indeed in the range :
- Constrain :
0 = x0 * ( x0^3 - 6 * x0^2 + 11 * x0 - 6 ) - Constrain :
0 = x1 * ( x1^3 - 6 * x1^2 + 11 * x1 - 6 ) - Constrain :
0 = x2 * ( x2^3 - 6 * x2^2 + 11 * x2 - 6 ) - Constrain :
0 = x3 * ( x3^3 - 6 * x3^2 + 11 * x3 - 6 ) - Constrain :
0 = x4 * ( x4^3 - 6 * x4^2 + 11 * x4 - 6 ) - Constrain :
0 = x5 * ( x5^3 - 6 * x5^2 + 11 * x5 - 6 ) - Constrain :
0 = x6 * ( x6^3 - 6 * x6^2 + 11 * x6 - 6 ) - Constrain :
0 = x7 * ( x7^3 - 6 * x7^2 + 11 * x7 - 6 )
- Constrain :
Endo Scalar Multiplication
We implement custom gate constraints for short Weierstrass curve endomorphism optimised variable base scalar multiplication.
Given a finite field of order , if the order is not a multiple of 2 nor 3, then an elliptic curve over in short Weierstrass form is represented by the set of points that satisfy the following equation with and : If and are two points in the curve , the goal of this operation is to perform the operation efficiently as .
S = (P + (b ? T : −T)) + P
The same algorithm can be used to perform other scalar multiplications, meaning it is not restricted to the case , but it can be used for any arbitrary . This is done by decomposing the scalar into its binary representation. Moreover, for every step, there will be a one-bit constraint meant to differentiate between addition and subtraction for the operation :
In particular, the constraints of this gate take care of 4 bits of the scalar within a single EVBSM row. When the scalar is longer (which will usually be the case), multiple EVBSM rows will be concatenated.
| Row | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | Type |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| i | xT | yT | Ø | Ø | xP | yP | n | xR | yR | s1 | s3 | b1 | b2 | b3 | b4 | EVBSM |
| i+1 | = | = | xS | yS | n’ | xR’ | yR’ | s1’ | s3’ | b1’ | b2’ | b3’ | b4’ | EVBSM |
The layout of this gate (and the next row) allows for this chained behavior where the output point of the current row gets accumulated as one of the inputs of the following row, becoming in the next constraints. Similarly, the scalar is decomposed into binary form and ( respectively) will store the current accumulated value and the next one for the check.
For readability, we define the following variables for the constraints:
endoEndoCoefficientxq1endoxq2endoyq1yq2
These are the 11 constraints that correspond to each EVBSM gate, which take care of 4 bits of the scalar within a single EVBSM row:
- First block:
(xq1 - xp) * s1 = yq1 - yp(2 * xp – s1^2 + xq1) * ((xp – xr) * s1 + yr + yp) = (xp – xr) * 2 * yp(yr + yp)^2 = (xp – xr)^2 * (s1^2 – xq1 + xr)
- Second block:
(xq2 - xr) * s3 = yq2 - yr(2*xr – s3^2 + xq2) * ((xr – xs) * s3 + ys + yr) = (xr – xs) * 2 * yr(ys + yr)^2 = (xr – xs)^2 * (s3^2 – xq2 + xs)
- Booleanity checks:
- Bit flag :
0 = b1 * (b1 - 1) - Bit flag :
0 = b2 * (b2 - 1) - Bit flag :
0 = b3 * (b3 - 1) - Bit flag :
0 = b4 * (b4 - 1)
- Bit flag :
- Binary decomposition:
- Accumulated scalar:
n_next = 16 * n + 8 * b1 + 4 * b2 + 2 * b3 + b4
- Accumulated scalar:
The constraints above are derived from the following EC Affine arithmetic equations:
- (1) =>
- (2&3) =>
- (2) =>
- <=>
- (3) =>
- <=>
- (4) =>
- (5&6) =>
- (5) =>
- <=>
- (6) =>
- <=>
Defining and as
Gives the following equations when substituting the values of and :
(xq1 - xp) * s1 = (2 * b1 - 1) * yt - yp(2 * xp – s1^2 + xq1) * ((xp – xr) * s1 + yr + yp) = (xp – xr) * 2 * yp(yr + yp)^2 = (xp – xr)^2 * (s1^2 – xq1 + xr)
(xq2 - xr) * s3 = (2 * b2 - 1) * yt - yr(2 * xr – s3^2 + xq2) * ((xr – xs) * s3 + ys + yr) = (xr – xs) * 2 * yr(ys + yr)^2 = (xr – xs)^2 * (s3^2 – xq2 + xs)
Scalar Multiplication
We implement custom Plonk constraints for short Weierstrass curve variable base scalar multiplication.
Given a finite field of order , if the order is not a multiple of 2 nor 3, then an elliptic curve over in short Weierstrass form is represented by the set of points that satisfy the following equation with and : If and are two points in the curve , the algorithm we represent here computes the operation (point doubling and point addition) as .
The original algorithm that is being used can be found in the Section 3.1 of https://arxiv.org/pdf/math/0208038.pdf, which can perform the above operation using 1 multiplication, 2 squarings and 2 divisions (one more squaring) if ), thanks to the fact that computing the -coordinate of the intermediate addition is not required. This is more efficient to the standard algorithm that requires 1 more multiplication, 3 squarings in total and 2 divisions.
Moreover, this algorithm can be applied not only to the operation , but any other scalar multiplication . This can be done by expressing the scalar in biwise form and performing a double-and-add approach. Nonetheless, this requires conditionals to differentiate from . For that reason, we will implement the following pseudocode from https://github.com/zcash/zcash/issues/3924 (where instead, they give a variant of the above efficient algorithm for Montgomery curves ).
Acc := [2]T
for i = n-1 ... 0:
Q := (r_i == 1) ? T : -T
Acc := Acc + (Q + Acc)
return (d_0 == 0) ? Q - P : Q
The layout of the witness requires 2 rows.
The i-th row will be a VBSM gate whereas the next row will be a ZERO gate.
| Row | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | Type |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| i | xT | yT | x0 | y0 | n | n’ | x1 | y1 | x2 | y2 | x3 | y3 | x4 | y4 | VBSM | |
| i+1 | x5 | y5 | b0 | b1 | b2 | b3 | b4 | s0 | s1 | s2 | s3 | s4 | ZERO |
The gate constraints take care of 5 bits of the scalar multiplication.
Each single bit consists of 4 constraints.
There is one additional constraint imposed on the final number.
Thus, the VarBaseMul gate argument requires 21 constraints.
For every bit, there will be one constraint meant to differentiate between addition and subtraction for the operation :
S = (P + (b ? T : −T)) + P
We follow this criteria:
- If the bit is positive, the sign should be a subtraction
- If the bit is negative, the sign should be an addition
Then, paraphrasing the above, we will represent this behavior as:
S = (P - (2 * b - 1) * T ) + P
Let us call Input the point with coordinates (xI, yI) and
Target is the point being added with coordinates (xT, yT).
Then Output will be the point with coordinates (xO, yO) resulting from O = ( I ± T ) + I
Do not confuse our Output point (xO, yO) with the point at infinity that is normally represented as .
In each step of the algorithm, we consider the following elliptic curves affine arithmetic equations:
For readability, we define the following 3 variables
in such a way that can be expressed as u / t:
rxtrxut
Next, for each bit in the algorithm, we create the following 4 constraints that derive from the above:
- Booleanity check on the bit :
0 = b * b - b - Constrain :
(xI - xT) * s1 = yI – (2b - 1) * yT - Constrain
Output-coordinate and :0 = u^2 - t^2 * (xO - xT + s1^2) - Constrain
Output-coordinate and :0 = (yO + yI) * t - (xI - xO) * u
When applied to the 5 bits, the value of the Target point (xT, yT) is maintained,
whereas the values for the Input and Output points form the chain:
[(x0, y0) -> (x1, y1) -> (x2, y2) -> (x3, y3) -> (x4, y4) -> (x5, y5)]
Similarly, 5 different s0..s4 are required, just like the 5 bits b0..b4.
Finally, the additional constraint makes sure that the scalar is being correctly expressed into its binary form (using the double-and-add decomposition) as: This equation is translated as the constraint:
- Binary decomposition:
0 = n' - (b4 + 2 * (b3 + 2 * (b2 + 2 * (b1 + 2 * (b0 + 2*n)))))
Range Check
The range check gadget is comprised of three circuit gates (RangeCheck0, RangeCheck1
and Zero) and can perform range checks on three values of up to 88 bits: and .
Optionally, RangeCheck0 can be used on its own to perform 64-bit range checks by
constraining witness cells 1-2 to zero.
Byte-order:
- Each cell value is in little-endian byte order
- Limbs are mapped to columns in big-endian order (i.e. the lowest columns contain the highest bits)
- We also have the highest bits covered by copy constraints and plookups, so that we can copy the highest two constraints to zero and get a 64-bit lookup, which are envisioned to be a common case
The values are decomposed into limbs as follows.
Lis a 12-bit lookup (or copy) limb,Cis a 2-bit “crumb” limb (we call half a nybble a crumb).
<----6----> <------8------>
v0 = L L L L L L C C C C C C C C
v1 = L L L L L L C C C C C C C C
<2> <--4--> <---------------18---------------->
v2 = C C L L L L C C C C C C C C C C C C C C C C C C
Witness structure:
| Row | Contents |
|---|---|
| 0 | |
| 1 | |
| 2 | |
| 3 |
- The first 2 rows contain and and their respective decompositions into 12-bit and 2-bit limbs
- The 3rd row contains and part of its decomposition: four 12-bit limbs and the 1st 10 crumbs
- The final row contains ’s and ’s 5th and 6th 12-bit limbs as well as the remaining 10 crumbs of
Because we are constrained to 4 lookups per row, we are forced to postpone some lookups of v0 and v1 to the final row.
Constraints:
For efficiency, the limbs are constrained differently according to their type.
- 12-bit limbs are constrained with plookups
- 2-bit crumbs are constrained with degree-4 constraints
Layout:
This is how the three 88-bit inputs and are layed out and constrained.
vipjis the jth 12-bit limb of valuevicjis the jth 2-bit crumb limb of value
| Gates | RangeCheck0 | RangeCheck0 | RangeCheck1 | Zero |
|---|---|---|---|---|
| Rows | 0 | 1 | 2 | 3 |
| Cols | ||||
| 0 | v0 | v1 | v2 | 0 |
| MS:1 | copy v0p0 | copy v1p0 | crumb v2c0 | crumb v2c10 |
| 2 | copy v0p1 | copy v1p1 | crumb v2c1 | crumb v2c11 |
| 3 | plookup v0p2 | plookup v1p2 | plookup v2p0 | plookup v0p0 |
| 4 | plookup v0p3 | plookup v1p3 | plookup v2p1 | plookup v0p1 |
| 5 | plookup v0p4 | plookup v1p4 | plookup v2p2 | plookup v1p0 |
| 6 | plookup v0p5 | plookup v1p5 | plookup v2p3 | plookup v1p1 |
| 7 | crumb v0c0 | crumb v1c0 | crumb v2c2 | crumb v2c12 |
| 8 | crumb v0c1 | crumb v1c1 | crumb v2c3 | crumb v2c13 |
| 9 | crumb v0c2 | crumb v1c2 | crumb v2c4 | crumb v2c14 |
| 10 | crumb v0c3 | crumb v1c3 | crumb v2c5 | crumb v2c15 |
| 11 | crumb v0c4 | crumb v1c4 | crumb v2c6 | crumb v2c16 |
| 12 | crumb v0c5 | crumb v1c5 | crumb v2c7 | crumb v2c17 |
| 13 | crumb v0c6 | crumb v1c6 | crumb v2c8 | crumb v2c18 |
| LS:14 | crumb v0c7 | crumb v1c7 | crumb v2c9 | crumb v2c19 |
The 12-bit chunks are constrained with plookups and the 2-bit crumbs are constrained with degree-4 constraints of the form .
Note that copy denotes a plookup that is deferred to the 4th gate (i.e. Zero).
This is because of the limitation that we have at most 4 lookups per row.
The copies are constrained using the permutation argument.
Gate types:
Different rows are constrained using different CircuitGate types
| Row | CircuitGate | Purpose |
|---|---|---|
| 0 | RangeCheck0 | Partially constrain |
| 1 | RangeCheck0 | Partially constrain |
| 2 | RangeCheck1 | Fully constrain (and trigger plookups constraints on row 3) |
| 3 | Zero | Complete the constraining of and using lookups |
Each CircuitGate type corresponds to a unique polynomial and thus is assigned its own unique powers of alpha
RangeCheck0 - Range check constraints
- This circuit gate is used to partially constrain values and
- Optionally, it can be used on its own as a single 64-bit range check by constraining columns 1 and 2 to zero
- The rest of and are constrained by the lookups in the
Zerogate row - This gate operates on the
Currrow
It uses three different types of constraints
- copy - copy to another cell (12-bits)
- plookup - plookup (12-bits)
- crumb - degree-4 constraint (2-bits)
Given value v the layout looks like this
| Column | Curr |
|---|---|
| 0 | v |
| 1 | copy vp0 |
| 2 | copy vp1 |
| 3 | plookup vp2 |
| 4 | plookup vp3 |
| 5 | plookup vp4 |
| 6 | plookup vp5 |
| 7 | crumb vc0 |
| 8 | crumb vc1 |
| 9 | crumb vc2 |
| 10 | crumb vc3 |
| 11 | crumb vc4 |
| 12 | crumb vc5 |
| 13 | crumb vc6 |
| 14 | crumb vc7 |
where the notation vpi and vci defined in the “Layout” section above.
RangeCheck1 - Range check constraints
- This circuit gate is used to fully constrain
- It operates on the
CurrandNextrows
It uses two different types of constraints
- plookup - plookup (12-bits)
- crumb - degree-4 constraint (2-bits)
Given value v2 the layout looks like this
| Column | Curr | Next |
|---|---|---|
| 0 | v2 | (ignored) |
| 1 | crumb v2c0 | crumb v2c10 |
| 2 | crumb v2c1 | crumb v2c11 |
| 3 | plookup v2p0 | (ignored) |
| 4 | plookup v2p1 | (ignored) |
| 5 | plookup v2p2 | (ignored) |
| 6 | plookup v2p3 | (ignored) |
| 7 | crumb v2c2 | crumb v2c12 |
| 8 | crumb v2c3 | crumb v2c13 |
| 9 | crumb v2c4 | crumb v2c14 |
| 10 | crumb v2c5 | crumb v2c15 |
| 11 | crumb v2c6 | crumb v2c16 |
| 12 | crumb v2c7 | crumb v2c17 |
| 13 | crumb v2c8 | crumb v2c18 |
| 14 | crumb v2c9 | crumb v2c19 |
where the notation v2ci and v2pi defined in the “Layout” section above.
Foreign Field Addition
These circuit gates are used to constrain that
left_input +/- right_input = field_overflow * foreign_modulus + result
Documentation
For more details please see the Foreign Field Addition RFC
Mapping
To make things clearer, the following mapping between the variable names used in the code and those of the RFC document can be helpful.
left_input_lo -> a0 right_input_lo -> b0 result_lo -> r0 bound_lo -> u0
left_input_mi -> a1 right_input_mi -> b1 result_mi -> r1 bound_mi -> u1
left_input_hi -> a2 right_input_hi -> b2 result_hi -> r2 bound_hi -> u2
field_overflow -> q
sign -> s
carry_lo -> c0
carry_mi -> c1
bound_carry_lo -> k0
bound_carry_mi -> k1
Note: Our limbs are 88-bit long. We denote with:
lothe least significant limb (in little-endian, this is from 0 to 87)mithe middle limb (in little-endian, this is from 88 to 175)hithe most significant limb (in little-endian, this is from 176 to 263)
Let left_input_lo, left_input_mi, left_input_hi be 88-bit limbs of the left element
Let right_input_lo, right_input_mi, right_input_hi be 88-bit limbs of the right element
Let foreign_modulus_lo, foreign_modulus_mi, foreign_modulus_hi be 88-bit limbs of the foreign modulus
Then the limbs of the result are
result_lo = left_input_lo +/- right_input_lo - field_overflow * foreign_modulus_lo - 2^{88} * result_carry_loresult_mi = left_input_mi +/- right_input_mi - field_overflow * foreign_modulus_mi - 2^{88} * result_carry_mi + result_carry_loresult_hi = left_input_hi +/- right_input_hi - field_overflow * foreign_modulus_hi + result_carry_mi
field_overflow or or handles overflows in the field
result_carry_i are auxiliary variables that handle carries between limbs
Apart from the range checks of the chained inputs, we need to do an additional range check for a final bound
to make sure that the result is less than the modulus, by adding 2^{3*88} - foreign_modulus to it.
(This can be computed easily from the limbs of the modulus)
Note that 2^{264} as limbs represents: (0, 0, 0, 1) then:
The upper-bound check can be calculated as
bound_lo = result_lo - foreign_modulus_lo - bound_carry_lo * 2^{88}bound_mi = result_mi - foreign_modulus_mi - bound_carry_mi * 2^{88} + bound_carry_lobound_hi = result_hi - foreign_modulus_hi + 2^{88} + bound_carry_mi
Which is equivalent to another foreign field addition with right input 2^{264}, q = 1 and s = 1
bound_lo = result_lo + s * 0 - q * foreign_modulus_lo - bound_carry_lo * 2^{88}bound_mi = result_mi + s * 0 - q * foreign_modulus_mi - bound_carry_mi * 2^{88} + bound_carry_lobound_hi = result_hi + s * 2^{88} - q * foreign_modulus_hi + bound_carry_mi
bound_carry_i or or are auxiliary variables that handle carries between limbs
The range check of bound can be skipped until the end of the operations
and result is an intermediate value that is unused elsewhere (since the final result
must have had the right amount of moduli subtracted along the way, meaning a multiple of the modulus).
In other words, intermediate results could potentially give a valid witness that satisfies the constraints
but where the result is larger than the modulus (yet smaller than 2^{264}). The reason that we have a
final bound check is to make sure that the final result (min_result) is indeed the minimum one
(meaning less than the modulus).
A more optimized version of these constraints is able to reduce by 2 the number of constraints and by 1 the number of witness cells needed. The idea is to condense the low and middle limbs in one longer limb of 176 bits (which fits inside our native field) and getting rid of the low carry flag. With this idea in mind, the sole carry flag we need is the one located between the middle and the high limbs.
Layout
You could lay this out as a double-width gate for chained foreign additions and a final row, e.g.:
| col | ForeignFieldAdd | chain ForeignFieldAdd | final ForeignFieldAdd | final Zero |
|---|---|---|---|---|
| 0 | left_input_lo (copy) | result_lo (copy) | min_result_lo (copy) | bound_lo (copy) |
| 1 | left_input_mi (copy) | result_mi (copy) | min_result_mi (copy) | bound_mi (copy) |
| 2 | left_input_hi (copy) | result_hi (copy) | min_result_hi (copy) | bound_hi (copy) |
| 3 | right_input_lo (copy) | 0 (check) | ||
| 4 | right_input_mi (copy) | 0 (check) | ||
| 5 | right_input_hi (copy) | 2^88 (check) | ||
| 6 | sign (copy) | 1 (check) | ||
| 7 | field_overflow | 1 (check) | ||
| 8 | carry | bound_carry | ||
| 9 | ||||
| 10 | ||||
| 11 | ||||
| 12 | ||||
| 13 | ||||
| 14 |
We reuse the foreign field addition gate for the final bound check since this is an addition with a specific parameter structure. Checking that the correct right input, overflow, and sign are used shall be done by copy constraining these values with a public input value. One could have a specific gate for just this check requiring less constrains, but the cost of adding one more selector gate outweights the savings of one row and a few constraints of difference.
Integration
- Copy signs from public input - Range check the final bound
TODO: move sign to the coefficient so that the bound check can also check that ovf is one.
Foreign Field Multiplication
This gadget is used to constrain that
left_input * right_input = quotient * foreign_field_modulus + remainder
Documentation
For more details please see the Foreign Field Multiplication RFC
Notations
For clarity, we use more descriptive variable names in the code than in the RFC, which uses mathematical notations.
In order to relate the two documents, the following mapping between the variable names used in the code and those of the RFC can be helpful.
left_input0 => a0 right_input0 => b0 quotient0 => q0 remainder0 => r0
left_input1 => a1 right_input1 => b1 quotient1 => q1 remainder1 => r1
left_input2 => a2 right_input2 => b2 quotient2 => q2 remainder2 => r2
product1_lo => p10 product1_hi_0 => p110 product1_hi_1 => p111
carry0 => v0 carry1_lo => v10 carry1_hi => v11
scaled_carry1_hi => scaled_v11
quotient_bound0 => q'0 quotient_bound12 => q'12
quotient_bound_carry0 => q'_carry0 quotient_bound_carry12 = q'_carry12
Suffixes
The variable names in this code uses descriptive suffixes to convey information about the
positions of the bits referred to. When a word is split into up to n parts
we use: 0, 1 … n (where n is the most significant). For example, if we split
word x into three limbs, we’d name them x0, x1 and x2 or x[0], x[1] and x[2].
Continuing in this fashion, when one of those words is subsequently split in half, then we
add the suffixes _lo and _hi, where hi corresponds to the most significant bits.
For our running example, x1 would become x1_lo and x1_hi. If we are splitting into
more than two things, then we pick meaningful names for each.
So far we’ve explained our conventions for a splitting depth of up to 2. For splitting
deeper than two, we simply cycle back to our depth 1 suffixes again. So for example, x1_lo
would be split into x1_lo_0 and x1_lo_1.
Parameters
foreign_field_modulus:= foreign field modulus $f$ (stored in constraint system)neg_foreign_field_modulus:= negated foreign field modulus $f’$ (computed by prover/verifier)
NB: the native field modulus is obtainable from F, the native field's trait bound below.
```
##### Witness
* `left_input` := left foreign field element multiplicand <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5782em;vertical-align:-0.0391em;"></span><span class="mspace nobreak"> </span><span class="mrel">∈</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.9694em;vertical-align:-0.2861em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.13889em;">F</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em;"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.10764em;">f</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em;"><span></span></span></span></span></span></span></span></span></span>
* `right_input` := right foreign field element multiplicand <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5782em;vertical-align:-0.0391em;"></span><span class="mspace nobreak"> </span><span class="mrel">∈</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.9694em;vertical-align:-0.2861em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.13889em;">F</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em;"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.10764em;">f</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em;"><span></span></span></span></span></span></span></span></span></span>
* `quotient` := foreign field quotient <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5782em;vertical-align:-0.0391em;"></span><span class="mspace nobreak"> </span><span class="mrel">∈</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.9694em;vertical-align:-0.2861em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.13889em;">F</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em;"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.10764em;">f</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em;"><span></span></span></span></span></span></span></span></span></span>
* `remainder` := foreign field remainder <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5782em;vertical-align:-0.0391em;"></span><span class="mspace nobreak"> </span><span class="mrel">∈</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.9694em;vertical-align:-0.2861em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.13889em;">F</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em;"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.10764em;">f</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em;"><span></span></span></span></span></span></span></span></span></span>
* `carry0` := 2 bit carry
* `carry1_lo` := low 88 bits of `carry1`
* `carry1_hi` := high 3 bits of `carry1`
* `scaled_carry1_hi` : = `carry1_hi` scaled by 2^9
* `product1_lo` := lowest 88 bits of middle intermediate product
* `product1_hi_0` := lowest 88 bits of middle intermediate product's highest 88 + 2 bits
* `product1_hi_1` := highest 2 bits of middle intermediate product
* `quotient_bound` := quotient bound for checking `q < f`
* `quotient_bound_carry01` := quotient bound addition 1st carry bit
* `quotient_bound_carry2` := quotient bound addition 2nd carry bit
##### Layout
The foreign field multiplication gate's rows are layed out like this
| col | `ForeignFieldMul` | `Zero` |
| --- | ---------------------------- | ------------------------- |
| 0 | `left_input0` (copy) | `remainder0` (copy) |
| 1 | `left_input1` (copy) | `remainder1` (copy) |
| 2 | `left_input2` (copy) | `remainder2` (copy) |
| 3 | `right_input0` (copy) | `quotient_bound01` (copy) |
| 4 | `right_input1` (copy) | `quotient_bound2` (copy) |
| 5 | `right_input2` (copy) | `product1_lo` (copy) |
| 6 | `carry1_lo` (copy) | `product1_hi_0` (copy) |
| 7 | `carry1_hi` (plookup) | `product1_hi_1` |
| 8 | `scaled_carry1_hi` (plookup) | |
| 9 | `carry0` | |
| 10 | `quotient0` | |
| 11 | `quotient1` | |
| 12 | `quotient2` | |
| 13 | `quotient_bound_carry01` | |
| 14 | `quotient_bound_carry2` | |
#### Xor
`Xor16` - Chainable XOR constraints for words of multiples of 16 bits.
* This circuit gate is used to constrain that `in1` xored with `in2` equals `out`
* The length of `in1`, `in2` and `out` must be the same and a multiple of 16bits.
* This gate operates on the `Curr` and `Next` rows.
It uses three different types of constraints
* copy - copy to another cell (32-bits)
* plookup - xor-table plookup (4-bits)
* decomposition - the constraints inside the gate
The 4-bit crumbs are assumed to be laid out with `0` column being the least significant crumb.
Given values `in1`, `in2` and `out`, the layout looks like this:
| Column | `Curr` | `Next` |
| ------ | ---------------- | ---------------- |
| 0 | copy `in1` | copy `in1'` |
| 1 | copy `in2` | copy `in2'` |
| 2 | copy `out` | copy `out'` |
| 3 | plookup0 `in1_0` | |
| 4 | plookup1 `in1_1` | |
| 5 | plookup2 `in1_2` | |
| 6 | plookup3 `in1_3` | |
| 7 | plookup0 `in2_0` | |
| 8 | plookup1 `in2_1` | |
| 9 | plookup2 `in2_2` | |
| 10 | plookup3 `in2_3` | |
| 11 | plookup0 `out_0` | |
| 12 | plookup1 `out_1` | |
| 13 | plookup2 `out_2` | |
| 14 | plookup3 `out_3` | |
One single gate with next values of `in1'`, `in2'` and `out'` being zero can be used to check
that the original `in1`, `in2` and `out` had 16-bits. We can chain this gate 4 times as follows
to obtain a gadget for 64-bit words XOR:
| Row | `CircuitGate` | Purpose |
| --- | ------------- | ------------------------------------------ |
| 0 | `Xor16` | Xor 2 least significant bytes of the words |
| 1 | `Xor16` | Xor next 2 bytes of the words |
| 2 | `Xor16` | Xor next 2 bytes of the words |
| 3 | `Xor16` | Xor 2 most significant bytes of the words |
| 4 | `Zero` | Zero values, can be reused as generic gate |
```admonition::notice
We could half the number of rows of the 64-bit XOR gadget by having lookups
for 8 bits at a time, but for now we will use the 4-bit XOR table that we have.
Rough computations show that if we run 8 or more Keccaks in one circuit we should
use the 8-bit XOR table.
```
## Setup
In this section we specify the setup that goes into creating two indexes from a circuit:
* A [*prover index*](#prover-index), necessary for the prover to to create proofs.
* A [*verifier index*](#verifier-index), necessary for the verifier to verify proofs.
<div id="admonition-note-6" class="admonition note">
<div class="admonition-title">
Note
<a class="admonition-anchor-link" href="#admonition-note-6"></a>
</div>
<div>
The circuit creation part is not specified in this document. It might be specified in a separate document, or we might want to specify how to create the circuit description tables.
</div>
</div>
As such, the transformation of a circuit into these two indexes can be seen as a compilation step. Note that the prover still needs access to the original circuit to create proofs, as they need to execute it to create the witness (register table).
### Common Index
In this section we describe data that both the prover and the verifier index share.
**`URS` (Uniform Reference String)** The URS is a set of parameters that is generated once, and shared between the prover and the verifier.
It is used for polynomial commitments, so refer to the [poly-commitment specification](./poly-commitment.md) for more details.
<div id="admonition-note-7" class="admonition note">
<div class="admonition-title">
Note
<a class="admonition-anchor-link" href="#admonition-note-7"></a>
</div>
<div>
Kimchi currently generates the URS based on the circuit, and attach it to the index. So each circuit can potentially be accompanied with a different URS. On the other hand, Mina reuses the same URS for multiple circuits ([see zkapps for more details](https://minaprotocol.com/blog/what-are-zkapps)).
</div>
</div>
**`Domain`**. A domain large enough to contain the circuit and the zero-knowledge rows (used to provide zero-knowledge to the protocol). Specifically, the smallest subgroup in our field that has order greater or equal to `n + ZK_ROWS`, with `n` is the number of gates in the circuit.
TODO: what if the domain is larger than the URS?
<div id="admonition-ordering-of-elements-in-the-domain" class="admonition warning">
<div class="admonition-title">
Ordering of elements in the domain
<a class="admonition-anchor-link" href="#admonition-ordering-of-elements-in-the-domain"></a>
</div>
<div>
Note that in this specification we always assume that the first element of a domain is <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">1</span></span></span></span>.
</div>
</div>
**`Shifts`**. As part of the permutation, we need to create `PERMUTS` shifts.
To do that, the following logic is followed (in pseudo code):
(TODO: move shift creation within the permutation section?)
```python
shifts[0] = 1 # first shift is identity
for i in 0..7: # generate 7 shifts
i = 7
shift, i = sample(domain, i)
while shifts.contains(shift) do:
shift, i = sample(domain, i)
shift[i] = shift
def sample(domain, i):
i += 1
shift = Field(Blake2b512(to_be_bytes(i)))
while is_not_quadratic_non_residue(shift) || domain.contains(shift):
i += 1
shift = Field(Blake2b512(to_be_bytes(i)))
return shift, i
```
**`Public`**. This variable simply contains the number of public inputs. (TODO: actually, it's not contained in the verifier index)
The compilation steps to create the common index are as follow:
1. If the circuit is less than 2 gates, abort.
2. Create a domain for the circuit. That is,
compute the smallest subgroup of the field that
has order greater or equal to `n + ZK_ROWS` elements.
3. Pad the circuit: add zero gates to reach the domain size.
4. sample the `PERMUTS` shifts.
### Lookup Index
If lookup is used, the following values are added to the common index:
**`LookupSelectors`**. The list of lookup selectors used. In practice, this tells you which lookup tables are used.
**`TableIds`**. This is a list of table ids used by the Lookup gate.
**`MaxJointSize`**. This is the maximum number of columns appearing in the lookup tables used by the lookup selectors. For example, the XOR lookup has 3 columns.
To create the index, follow these steps:
1. If no lookup is used in the circuit, do not create a lookup index
2. Get the lookup selectors and lookup tables (TODO: how?)
3. Concatenate runtime lookup tables with the ones used by gates
4. Get the highest number of columns `max_table_width`
that a lookup table can have.
5. Create the concatenated table of all the fixed lookup tables.
It will be of height the size of the domain,
and of width the maximum width of any of the lookup tables.
In addition, create an additional column to store all the tables' table IDs.
For example, if you have a table with ID 0
| | | |
| :---: | :---: | :---: |
| 1 | 2 | 3 |
| 5 | 6 | 7 |
| 0 | 0 | 0 |
and another table with ID 1
| | |
| :---: | :---: |
| 8 | 9 |
the concatenated table in a domain of size 5 looks like this:
| | | |
| :---: | :---: | :---: |
| 1 | 2 | 3 |
| 5 | 6 | 7 |
| 0 | 0 | 0 |
| 8 | 9 | 0 |
| 0 | 0 | 0 |
with the table id vector:
| table id |
| :------: |
| 0 |
| 0 |
| 0 |
| 1 |
| 0 |
To do this, for each table:
- Update the corresponding entries in a table id vector (of size the domain as well)
with the table ID of the table.
- Copy the entries from the table to new rows in the corresponding columns of the concatenated table.
- Fill in any unused columns with 0 (to match the dummy value)
6. Pad the end of the concatened table with the dummy value.
7. Pad the end of the table id vector with 0s.
8. pre-compute polynomial and evaluation form for the look up tables
9. pre-compute polynomial and evaluation form for the table IDs,
only if a table with an ID different from zero was used.
### Prover Index
Both the prover and the verifier index, besides the common parts described above, are made out of pre-computations which can be used to speed up the protocol.
These pre-computations are optimizations, in the context of normal proofs, but they are necessary for recursion.
```rs
pub struct ProverIndex<G: KimchiCurve> {
/// constraints system polynomials
#[serde(bound = "ConstraintSystem<G::ScalarField>: Serialize + DeserializeOwned")]
pub cs: ConstraintSystem<G::ScalarField>,
/// The symbolic linearization of our circuit, which can compile to concrete types once certain values are learned in the protocol.
#[serde(skip)]
pub linearization: Linearization<Vec<PolishToken<G::ScalarField>>>,
/// The mapping between powers of alpha and constraints
#[serde(skip)]
pub powers_of_alpha: Alphas<G::ScalarField>,
/// polynomial commitment keys
#[serde(skip)]
pub srs: Arc<SRS<G>>,
/// maximal size of polynomial section
pub max_poly_size: usize,
#[serde(bound = "EvaluatedColumnCoefficients<G::ScalarField>: Serialize + DeserializeOwned")]
pub evaluated_column_coefficients: EvaluatedColumnCoefficients<G::ScalarField>,
#[serde(bound = "ColumnEvaluations<G::ScalarField>: Serialize + DeserializeOwned")]
pub column_evaluations: ColumnEvaluations<G::ScalarField>,
/// The verifier index corresponding to this prover index
#[serde(skip)]
pub verifier_index: Option<VerifierIndex<G>>,
/// The verifier index digest corresponding to this prover index
#[serde_as(as = "Option<o1_utils::serialization::SerdeAs>")]
pub verifier_index_digest: Option<G::BaseField>,
}
```
### Verifier Index
Same as the prover index, we have a number of pre-computations as part of the verifier index.
```rs
#[serde_as]
#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct LookupVerifierIndex<G: CommitmentCurve> {
pub lookup_used: LookupsUsed,
#[serde(bound = "PolyComm<G>: Serialize + DeserializeOwned")]
pub lookup_table: Vec<PolyComm<G>>,
#[serde(bound = "PolyComm<G>: Serialize + DeserializeOwned")]
pub lookup_selectors: LookupSelectors<PolyComm<G>>,
/// Table IDs for the lookup values.
/// This may be `None` if all lookups originate from table 0.
#[serde(bound = "PolyComm<G>: Serialize + DeserializeOwned")]
pub table_ids: Option<PolyComm<G>>,
/// Information about the specific lookups used
pub lookup_info: LookupInfo,
/// An optional selector polynomial for runtime tables
#[serde(bound = "PolyComm<G>: Serialize + DeserializeOwned")]
pub runtime_tables_selector: Option<PolyComm<G>>,
}
#[serde_as]
#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct VerifierIndex<G: KimchiCurve> {
/// evaluation domain
#[serde_as(as = "o1_utils::serialization::SerdeAs")]
pub domain: D<G::ScalarField>,
/// maximal size of polynomial section
pub max_poly_size: usize,
/// polynomial commitment keys
#[serde(skip)]
pub srs: OnceCell<Arc<SRS<G>>>,
/// number of public inputs
pub public: usize,
/// number of previous evaluation challenges, for recursive proving
pub prev_challenges: usize,
// index polynomial commitments
/// permutation commitment array
#[serde(bound = "PolyComm<G>: Serialize + DeserializeOwned")]
pub sigma_comm: [PolyComm<G>; PERMUTS],
/// coefficient commitment array
#[serde(bound = "PolyComm<G>: Serialize + DeserializeOwned")]
pub coefficients_comm: [PolyComm<G>; COLUMNS],
/// coefficient commitment array
#[serde(bound = "PolyComm<G>: Serialize + DeserializeOwned")]
pub generic_comm: PolyComm<G>,
// poseidon polynomial commitments
/// poseidon constraint selector polynomial commitment
#[serde(bound = "PolyComm<G>: Serialize + DeserializeOwned")]
pub psm_comm: PolyComm<G>,
// ECC arithmetic polynomial commitments
/// EC addition selector polynomial commitment
#[serde(bound = "PolyComm<G>: Serialize + DeserializeOwned")]
pub complete_add_comm: PolyComm<G>,
/// EC variable base scalar multiplication selector polynomial commitment
#[serde(bound = "PolyComm<G>: Serialize + DeserializeOwned")]
pub mul_comm: PolyComm<G>,
/// endoscalar multiplication selector polynomial commitment
#[serde(bound = "PolyComm<G>: Serialize + DeserializeOwned")]
pub emul_comm: PolyComm<G>,
/// endoscalar multiplication scalar computation selector polynomial commitment
#[serde(bound = "PolyComm<G>: Serialize + DeserializeOwned")]
pub endomul_scalar_comm: PolyComm<G>,
/// Chacha polynomial commitments
#[serde(bound = "PolyComm<G>: Serialize + DeserializeOwned")]
pub chacha_comm: Option<[PolyComm<G>; 4]>,
/// Range check polynomial commitments
#[serde(bound = "PolyComm<G>: Serialize + DeserializeOwned")]
pub range_check_comm: Option<[PolyComm<G>; range_check::gadget::GATE_COUNT]>,
/// Foreign field modulus
pub foreign_field_modulus: Option<BigUint>,
/// Foreign field addition gates polynomial commitments
#[serde(bound = "Option<PolyComm<G>>: Serialize + DeserializeOwned")]
pub foreign_field_add_comm: Option<PolyComm<G>>,
/// Foreign field multiplication gates polynomial commitments
#[serde(bound = "Option<PolyComm<G>>: Serialize + DeserializeOwned")]
pub foreign_field_mul_comm: Option<PolyComm<G>>,
/// Xor commitments
#[serde(bound = "Option<PolyComm<G>>: Serialize + DeserializeOwned")]
pub xor_comm: Option<PolyComm<G>>,
/// wire coordinate shifts
#[serde_as(as = "[o1_utils::serialization::SerdeAs; PERMUTS]")]
pub shift: [G::ScalarField; PERMUTS],
/// zero-knowledge polynomial
#[serde(skip)]
pub zkpm: OnceCell<DensePolynomial<G::ScalarField>>,
// TODO(mimoo): isn't this redundant with domain.d1.group_gen ?
/// domain offset for zero-knowledge
#[serde(skip)]
pub w: OnceCell<G::ScalarField>,
/// endoscalar coefficient
#[serde(skip)]
pub endo: G::ScalarField,
#[serde(bound = "PolyComm<G>: Serialize + DeserializeOwned")]
pub lookup_index: Option<LookupVerifierIndex<G>>,
#[serde(skip)]
pub linearization: Linearization<Vec<PolishToken<G::ScalarField>>>,
/// The mapping between powers of alpha and constraints
#[serde(skip)]
pub powers_of_alpha: Alphas<G::ScalarField>,
}
```
## Proof Construction & Verification
Originally, kimchi is based on an interactive protocol that was transformed into a non-interactive one using the [Fiat-Shamir](https://o1-labs.github.io/mina-book/crypto/plonk/fiat_shamir.html) transform.
For this reason, it can be useful to visualize the high-level interactive protocol before the transformation:
```mermaid
sequenceDiagram
participant Prover
participant Verifier
Note over Prover,Verifier: Prover produces commitments to secret polynomials
Prover->>Verifier: public input & witness commitment
Verifier->>Prover: beta & gamma
Prover->>Verifier: permutation commitment
opt lookup
Prover->>Verifier: sorted
Prover->>Verifier: aggreg
end
Note over Prover,Verifier: Prover produces commitment to quotient polynomial
Verifier->>Prover: alpha
Prover->>Verifier: quotient commitment
Note over Prover,Verifier: Verifier produces an evaluation point
Verifier->>Prover: zeta
Note over Prover,Verifier: Prover provides helper evaluations
Prover->>Verifier: the generic selector gen(zeta) & gen(zeta * omega)
Prover->>Verifier: the poseidon selector pos(zeta) & pos(zeta * omega)
Prover->>Verifier: negated public input p(zeta) & p(zeta * omega)
Note over Prover,Verifier: Prover provides needed evaluations for the linearization
Note over Verifier: change of verifier (change of sponge)
Prover->>Verifier: permutation poly z(zeta) & z(zeta * omega)
Prover->>Verifier: the 15 registers w_i(zeta) & w_i(zeta * omega)
Prover->>Verifier: the 6 sigmas s_i(zeta) & s_i(zeta * omega)
Prover->>Verifier: ft(zeta * omega)
opt lookup
Prover->>Verifier: sorted(zeta) & sorted(zeta * omega)
Prover->>Verifier: aggreg(zeta) & aggreg(zeta * omega)
Prover->>Verifier: table(zeta) & table(zeta * omega)
end
Note over Prover,Verifier: Batch verification of evaluation proofs
Verifier->>Prover: u, v
Note over Verifier: change of verifier (change of sponge)
Prover->>Verifier: aggregated evaluation proof (involves more interaction)
```
The Fiat-Shamir transform simulates the verifier messages via a hash function that hashes the transcript of the protocol so far before outputing verifier messages.
You can find these operations under the [proof creation](#proof-creation) and [proof verification](#proof-verification) algorithms as absorption and squeezing of values with the sponge.
### Proof Structure
A proof consists of the following data structures:
```rs
/// Evaluations of a polynomial at 2 points
#[serde_as]
#[derive(Copy, Clone, Serialize, Deserialize, Default, Debug)]
#[cfg_attr(
feature = "ocaml_types",
derive(ocaml::IntoValue, ocaml::FromValue, ocaml_gen::Struct)
)]
#[serde(bound(
serialize = "Vec<o1_utils::serialization::SerdeAs>: serde_with::SerializeAs<Evals>",
deserialize = "Vec<o1_utils::serialization::SerdeAs>: serde_with::DeserializeAs<'de, Evals>"
))]
pub struct PointEvaluations<Evals> {
/// Evaluation at the challenge point zeta.
#[serde_as(as = "Vec<o1_utils::serialization::SerdeAs>")]
pub zeta: Evals,
/// Evaluation at `zeta . omega`, the product of the challenge point and the group generator.
#[serde_as(as = "Vec<o1_utils::serialization::SerdeAs>")]
pub zeta_omega: Evals,
}
/// Evaluations of lookup polynomials
#[serde_as]
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct LookupEvaluations<Evals> {
/// sorted lookup table polynomial
pub sorted: Vec<Evals>,
/// lookup aggregation polynomial
pub aggreg: Evals,
// TODO: May be possible to optimize this away?
/// lookup table polynomial
pub table: Evals,
/// Optionally, a runtime table polynomial.
pub runtime: Option<Evals>,
}
// TODO: this should really be vectors here, perhaps create another type for chunked evaluations?
/// Polynomial evaluations contained in a `ProverProof`.
/// - **Chunked evaluations** `Field` is instantiated with vectors with a length that equals the length of the chunk
/// - **Non chunked evaluations** `Field` is instantiated with a field, so they are single-sized#[serde_as]
#[serde_as]
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ProofEvaluations<Evals> {
/// witness polynomials
pub w: [Evals; COLUMNS],
/// permutation polynomial
pub z: Evals,
/// permutation polynomials
/// (PERMUTS-1 evaluations because the last permutation is only used in commitment form)
pub s: [Evals; PERMUTS - 1],
/// coefficient polynomials
pub coefficients: [Evals; COLUMNS],
/// lookup-related evaluations
pub lookup: Option<LookupEvaluations<Evals>>,
/// evaluation of the generic selector polynomial
pub generic_selector: Evals,
/// evaluation of the poseidon selector polynomial
pub poseidon_selector: Evals,
}
/// Commitments linked to the lookup feature
#[serde_as]
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(bound = "G: ark_serialize::CanonicalDeserialize + ark_serialize::CanonicalSerialize")]
pub struct LookupCommitments<G: AffineCurve> {
/// Commitments to the sorted lookup table polynomial (may have chunks)
pub sorted: Vec<PolyComm<G>>,
/// Commitment to the lookup aggregation polynomial
pub aggreg: PolyComm<G>,
/// Optional commitment to concatenated runtime tables
pub runtime: Option<PolyComm<G>>,
}
/// All the commitments that the prover creates as part of the proof.
#[serde_as]
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(bound = "G: ark_serialize::CanonicalDeserialize + ark_serialize::CanonicalSerialize")]
pub struct ProverCommitments<G: AffineCurve> {
/// The commitments to the witness (execution trace)
pub w_comm: [PolyComm<G>; COLUMNS],
/// The commitment to the permutation polynomial
pub z_comm: PolyComm<G>,
/// The commitment to the quotient polynomial
pub t_comm: PolyComm<G>,
/// Commitments related to the lookup argument
pub lookup: Option<LookupCommitments<G>>,
}
/// The proof that the prover creates from a [ProverIndex](super::prover_index::ProverIndex) and a `witness`.
#[serde_as]
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(bound = "G: ark_serialize::CanonicalDeserialize + ark_serialize::CanonicalSerialize")]
pub struct ProverProof<G: AffineCurve> {
/// All the polynomial commitments required in the proof
pub commitments: ProverCommitments<G>,
/// batched commitment opening proof
pub proof: OpeningProof<G>,
/// Two evaluations over a number of committed polynomials
pub evals: ProofEvaluations<PointEvaluations<Vec<G::ScalarField>>>,
/// Required evaluation for [Maller's optimization](https://o1-labs.github.io/mina-book/crypto/plonk/maller_15.html#the-evaluation-of-l)
#[serde_as(as = "o1_utils::serialization::SerdeAs")]
pub ft_eval1: G::ScalarField,
/// The public input
#[serde_as(as = "Vec<o1_utils::serialization::SerdeAs>")]
pub public: Vec<G::ScalarField>,
/// The challenges underlying the optional polynomials folded into the proof
pub prev_challenges: Vec<RecursionChallenge<G>>,
}
/// A struct to store the challenges inside a `ProverProof`
#[serde_as]
#[derive(Debug, Clone, Deserialize, Serialize)]
#[serde(bound = "G: ark_serialize::CanonicalDeserialize + ark_serialize::CanonicalSerialize")]
pub struct RecursionChallenge<G>
where
G: AffineCurve,
{
/// Vector of scalar field elements
#[serde_as(as = "Vec<o1_utils::serialization::SerdeAs>")]
pub chals: Vec<G::ScalarField>,
/// Polynomial commitment
pub comm: PolyComm<G>,
}
```
The following sections specify how a prover creates a proof, and how a verifier validates a number of proofs.
### Proof Creation
To create a proof, the prover expects:
* A prover index, containing a representation of the circuit (and optionaly pre-computed values to be used in the proof creation).
* The (filled) registers table, representing parts of the execution trace of the circuit.
<div id="admonition-note-8" class="admonition note">
<div class="admonition-title">
Note
<a class="admonition-anchor-link" href="#admonition-note-8"></a>
</div>
<div>
The public input is expected to be passed in the first `Public` rows of the registers table.
</div>
</div>
The following constants are set:
* `EVAL_POINTS = 2`. This is the number of points that the prover has to evaluate their polynomials at.
(<span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.07378em;">ζ</span></span></span></span> and <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.07378em;">ζ</span><span class="mord mathnormal" style="margin-right:0.03588em;">ω</span></span></span></span> where <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.07378em;">ζ</span></span></span></span> will be deterministically generated.)
* `ZK_ROWS = 3`. This is the number of rows that will be randomized to provide zero-knowledgeness.
Note that it only needs to be greater or equal to the number of evaluations (2) in the protocol.
Yet, it contains one extra row to take into account the last constraint (final value of the permutation accumulator).
(TODO: treat the final constraint separately so that ZK_ROWS = 2)
The prover then follows the following steps to create the proof:
1. Ensure we have room in the witness for the zero-knowledge rows.
We currently expect the witness not to be of the same length as the domain,
but instead be of the length of the (smaller) circuit.
If we cannot add `ZK_ROWS` rows to the columns of the witness before reaching
the size of the domain, abort.
1. Pad the witness columns with Zero gates to make them the same length as the domain.
Then, randomize the last `ZK_ROWS` of each columns.
1. Setup the Fq-Sponge.
1. Absorb the digest of the VerifierIndex.
1. Absorb the commitments of the previous challenges with the Fq-sponge.
1. Compute the negated public input polynomial as
the polynomial that evaluates to <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7778em;vertical-align:-0.1944em;"></span><span class="mord">−</span><span class="mord"><span class="mord mathnormal">p</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span> for the first `public_input_size` values of the domain,
and <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">0</span></span></span></span> for the rest.
1. Commit (non-hiding) to the negated public input polynomial.
1. Absorb the commitment to the public polynomial with the Fq-Sponge.
Note: unlike the original PLONK protocol,
the prover also provides evaluations of the public polynomial to help the verifier circuit.
This is why we need to absorb the commitment to the public polynomial at this point.
1. Commit to the witness columns by creating `COLUMNS` hidding commitments.
Note: since the witness is in evaluation form,
we can use the `commit_evaluation` optimization.
1. Absorb the witness commitments with the Fq-Sponge.
1. Compute the witness polynomials by interpolating each `COLUMNS` of the witness.
TODO: why not do this first, and then commit? Why commit from evaluation directly?
1. If using lookup:
- if using runtime table:
- check that all the provided runtime tables have length and IDs that match the runtime table configuration of the index
we expect the given runtime tables to be sorted as configured, this makes it easier afterwards
- calculate the contribution to the second column of the lookup table
(the runtime vector)
- If queries involve a lookup table with multiple columns
then squeeze the Fq-Sponge to obtain the joint combiner challenge <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9463em;vertical-align:-0.1944em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.05724em;">j</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7519em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span></span></span></span></span></span></span></span>,
otherwise set the joint combiner challenge <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9463em;vertical-align:-0.1944em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.05724em;">j</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7519em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span></span></span></span></span></span></span></span> to <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">0</span></span></span></span>.
- Derive the scalar joint combiner <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.854em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.05724em;">j</span></span></span></span> from <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9463em;vertical-align:-0.1944em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.05724em;">j</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7519em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span></span></span></span></span></span></span></span> using the endomorphism (TOOD: specify)
- If multiple lookup tables are involved,
set the `table_id_combiner` as the <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.0191em;vertical-align:-0.1944em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.05724em;">j</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8247em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span></span></span></span></span></span></span></span> with <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6595em;"></span><span class="mord mathnormal">i</span></span></span></span> the maximum width of any used table.
Essentially, this is to add a last column of table ids to the concatenated lookup tables.
- Compute the dummy lookup value as the combination of the last entry of the XOR table (so `(0, 0, 0)`).
Warning: This assumes that we always use the XOR table when using lookups.
- Compute the lookup table values as the combination of the lookup table entries.
- Compute the sorted evaluations.
- Randomize the last `EVALS` rows in each of the sorted polynomials
in order to add zero-knowledge to the protocol.
- Commit each of the sorted polynomials.
- Absorb each commitments to the sorted polynomials.
1. Sample <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.05278em;">β</span></span></span></span> with the Fq-Sponge.
1. Sample <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.05556em;">γ</span></span></span></span> with the Fq-Sponge.
1. If using lookup:
- Compute the lookup aggregation polynomial.
- Commit to the aggregation polynomial.
- Absorb the commitment to the aggregation polynomial with the Fq-Sponge.
1. Compute the permutation aggregation polynomial <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal" style="margin-right:0.04398em;">z</span></span></span></span>.
1. Commit (hidding) to the permutation aggregation polynomial <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal" style="margin-right:0.04398em;">z</span></span></span></span>.
1. Absorb the permutation aggregation polynomial <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal" style="margin-right:0.04398em;">z</span></span></span></span> with the Fq-Sponge.
1. Sample <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7519em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.0037em;">α</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7519em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span></span></span></span></span></span></span></span> with the Fq-Sponge.
1. Derive <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal" style="margin-right:0.0037em;">α</span></span></span></span> from <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7519em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.0037em;">α</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7519em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span></span></span></span></span></span></span></span> using the endomorphism (TODO: details)
1. TODO: instantiate alpha?
1. Compute the quotient polynomial (the <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6151em;"></span><span class="mord mathnormal">t</span></span></span></span> in <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.10764em;">f</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07153em;">Z</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3283em;"><span style="top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.08125em;">H</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">⋅</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.6151em;"></span><span class="mord mathnormal">t</span></span></span></span>).
The quotient polynomial is computed by adding all these polynomials together:
- the combined constraints for all the gates
- the combined constraints for the permutation
- TODO: lookup
- the negated public polynomial
and by then dividing the resulting polynomial with the vanishing polynomial <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07153em;">Z</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3283em;"><span style="top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.08125em;">H</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span>.
TODO: specify the split of the permutation polynomial into perm and bnd?
1. commit (hiding) to the quotient polynomial <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6151em;"></span><span class="mord mathnormal">t</span></span></span></span>
TODO: specify the dummies
1. Absorb the the commitment of the quotient polynomial with the Fq-Sponge.
1. Sample <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9463em;vertical-align:-0.1944em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07378em;">ζ</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7519em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span></span></span></span></span></span></span></span> with the Fq-Sponge.
1. Derive <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.07378em;">ζ</span></span></span></span> from <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9463em;vertical-align:-0.1944em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07378em;">ζ</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7519em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span></span></span></span></span></span></span></span> using the endomorphism (TODO: specify)
1. If lookup is used, evaluate the following polynomials at <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.07378em;">ζ</span></span></span></span> and <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.07378em;">ζ</span><span class="mord mathnormal" style="margin-right:0.03588em;">ω</span></span></span></span>:
- the aggregation polynomial
- the sorted polynomials
- the table polynonial
1. Chunk evaluate the following polynomials at both <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.07378em;">ζ</span></span></span></span> and <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.07378em;">ζ</span><span class="mord mathnormal" style="margin-right:0.03588em;">ω</span></span></span></span>:
- <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5806em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathnormal">s</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span>
- <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5806em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.02691em;">w</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:-0.0269em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span>
- <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal" style="margin-right:0.04398em;">z</span></span></span></span>
- lookup (TODO)
- generic selector
- poseidon selector
By "chunk evaluate" we mean that the evaluation of each polynomial can potentially be a vector of values.
This is because the index's `max_poly_size` parameter dictates the maximum size of a polynomial in the protocol.
If a polynomial <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.10764em;">f</span></span></span></span> exceeds this size, it must be split into several polynomials like so:
<span class="katex-display"><span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.10764em;">f</span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.10764em;">f</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em;"><span style="top:-2.55em;margin-left:-0.1076em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">0</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7144em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">n</span></span></span></span></span></span></span></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.10764em;">f</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em;"><span style="top:-2.55em;margin-left:-0.1076em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:1.1141em;vertical-align:-0.25em;"></span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8641em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">2</span><span class="mord mathnormal mtight">n</span></span></span></span></span></span></span></span></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.10764em;">f</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em;"><span style="top:-2.55em;margin-left:-0.1076em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.313em;"></span><span class="minner">⋯</span></span></span></span></span>
And the evaluation of such a polynomial is the following list for <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5782em;vertical-align:-0.0391em;"></span><span class="mord mathnormal">x</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">∈</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07378em;">ζ</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord mathnormal" style="margin-right:0.07378em;">ζ</span><span class="mord mathnormal" style="margin-right:0.03588em;">ω</span></span></span></span></span>:
<span class="katex-display"><span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.10764em;">f</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em;"><span style="top:-2.55em;margin-left:-0.1076em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">0</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.10764em;">f</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em;"><span style="top:-2.55em;margin-left:-0.1076em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.10764em;">f</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em;"><span style="top:-2.55em;margin-left:-0.1076em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="minner">…</span><span class="mclose">)</span></span></span></span></span>
TODO: do we want to specify more on that? It seems unecessary except for the t polynomial (or if for some reason someone sets that to a low value)
1. Evaluate the same polynomials without chunking them
(so that each polynomial should correspond to a single value this time).
1. Compute the ft polynomial.
This is to implement [Maller's optimization](https://o1-labs.github.io/mina-book/crypto/plonk/maller_15.html).
1. construct the blinding part of the ft polynomial commitment
see https://o1-labs.github.io/mina-book/crypto/plonk/maller_15.html#evaluation-proof-and-blinding-factors
1. Evaluate the ft polynomial at <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.07378em;">ζ</span><span class="mord mathnormal" style="margin-right:0.03588em;">ω</span></span></span></span> only.
1. Setup the Fr-Sponge
1. Squeeze the Fq-sponge and absorb the result with the Fr-Sponge.
1. Absorb the previous recursion challenges.
1. Compute evaluations for the previous recursion challenges.
1. Evaluate the negated public polynomial (if present) at <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.07378em;">ζ</span></span></span></span> and <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.07378em;">ζ</span><span class="mord mathnormal" style="margin-right:0.03588em;">ω</span></span></span></span>.
1. Absorb the unique evaluation of ft: <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.10764em;">f</span><span class="mord mathnormal">t</span><span class="mopen">(</span><span class="mord mathnormal" style="margin-right:0.07378em;">ζ</span><span class="mord mathnormal" style="margin-right:0.03588em;">ω</span><span class="mclose">)</span></span></span></span>.
1. Absorb all the polynomial evaluations in <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.07378em;">ζ</span></span></span></span> and <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.07378em;">ζ</span><span class="mord mathnormal" style="margin-right:0.03588em;">ω</span></span></span></span>:
- the public polynomial
- z
- generic selector
- poseidon selector
- the 15 register/witness
- 6 sigmas evaluations (the last one is not evaluated)
1. Sample <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7519em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.03588em;">v</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7519em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span></span></span></span></span></span></span></span> with the Fr-Sponge
1. Derive <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal" style="margin-right:0.03588em;">v</span></span></span></span> from <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7519em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.03588em;">v</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7519em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span></span></span></span></span></span></span></span> using the endomorphism (TODO: specify)
1. Sample <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7519em;"></span><span class="mord"><span class="mord mathnormal">u</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7519em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span></span></span></span></span></span></span></span> with the Fr-Sponge
1. Derive <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal">u</span></span></span></span> from <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7519em;"></span><span class="mord"><span class="mord mathnormal">u</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7519em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span></span></span></span></span></span></span></span> using the endomorphism (TODO: specify)
1. Create a list of all polynomials that will require evaluations
(and evaluation proofs) in the protocol.
First, include the previous challenges, in case we are in a recursive prover.
1. Then, include:
- the negated public polynomial
- the ft polynomial
- the permutation aggregation polynomial z polynomial
- the generic selector
- the poseidon selector
- the 15 registers/witness columns
- the 6 sigmas
- optionally, the runtime table
1. if using lookup:
- add the lookup sorted polynomials
- add the lookup aggreg polynomial
- add the combined table polynomial
- if present, add the runtime table polynomial
1. Create an aggregated evaluation proof for all of these polynomials at <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.07378em;">ζ</span></span></span></span> and <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.07378em;">ζ</span><span class="mord mathnormal" style="margin-right:0.03588em;">ω</span></span></span></span> using <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal">u</span></span></span></span> and <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal" style="margin-right:0.03588em;">v</span></span></span></span>.
### Proof Verification
TODO: we talk about batch verification, but is there an actual batch operation? It seems like we're just verifying an aggregated opening proof
We define two helper algorithms below, used in the batch verification of proofs.
#### Fiat-Shamir argument
We run the following algorithm:
1. Setup the Fq-Sponge.
1. Absorb the digest of the VerifierIndex.
1. Absorb the commitments of the previous challenges with the Fq-sponge.
1. Absorb the commitment of the public input polynomial with the Fq-Sponge.
1. Absorb the commitments to the registers / witness columns with the Fq-Sponge.
1. If lookup is used:
- If it involves queries to a multiple-column lookup table,
then squeeze the Fq-Sponge to obtain the joint combiner challenge <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9463em;vertical-align:-0.1944em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.05724em;">j</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7519em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span></span></span></span></span></span></span></span>,
otherwise set the joint combiner challenge <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9463em;vertical-align:-0.1944em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.05724em;">j</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7519em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span></span></span></span></span></span></span></span> to <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">0</span></span></span></span>.
- Derive the scalar joint combiner challenge <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.854em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.05724em;">j</span></span></span></span> from <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9463em;vertical-align:-0.1944em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.05724em;">j</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7519em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span></span></span></span></span></span></span></span> using the endomorphism.
(TODO: specify endomorphism)
- absorb the commitments to the sorted polynomials.
1. Sample <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.05278em;">β</span></span></span></span> with the Fq-Sponge.
1. Sample <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.05556em;">γ</span></span></span></span> with the Fq-Sponge.
1. If using lookup, absorb the commitment to the aggregation lookup polynomial.
1. Absorb the commitment to the permutation trace with the Fq-Sponge.
1. Sample <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7519em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.0037em;">α</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7519em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span></span></span></span></span></span></span></span> with the Fq-Sponge.
1. Derive <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal" style="margin-right:0.0037em;">α</span></span></span></span> from <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7519em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.0037em;">α</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7519em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span></span></span></span></span></span></span></span> using the endomorphism (TODO: details).
1. Enforce that the length of the <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6151em;"></span><span class="mord mathnormal">t</span></span></span></span> commitment is of size `PERMUTS`.
1. Absorb the commitment to the quotient polynomial <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6151em;"></span><span class="mord mathnormal">t</span></span></span></span> into the argument.
1. Sample <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9463em;vertical-align:-0.1944em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07378em;">ζ</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7519em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span></span></span></span></span></span></span></span> with the Fq-Sponge.
1. Derive <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.07378em;">ζ</span></span></span></span> from <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9463em;vertical-align:-0.1944em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07378em;">ζ</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7519em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span></span></span></span></span></span></span></span> using the endomorphism (TODO: specify).
1. Setup the Fr-Sponge.
1. Squeeze the Fq-sponge and absorb the result with the Fr-Sponge.
1. Absorb the previous recursion challenges.
1. Compute evaluations for the previous recursion challenges.
1. Evaluate the negated public polynomial (if present) at <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.07378em;">ζ</span></span></span></span> and <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.07378em;">ζ</span><span class="mord mathnormal" style="margin-right:0.03588em;">ω</span></span></span></span>.
NOTE: this works only in the case when the poly segment size is not smaller than that of the domain.
1. Absorb the unique evaluation of ft: <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.10764em;">f</span><span class="mord mathnormal">t</span><span class="mopen">(</span><span class="mord mathnormal" style="margin-right:0.07378em;">ζ</span><span class="mord mathnormal" style="margin-right:0.03588em;">ω</span><span class="mclose">)</span></span></span></span>.
1. Absorb all the polynomial evaluations in <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.07378em;">ζ</span></span></span></span> and <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.07378em;">ζ</span><span class="mord mathnormal" style="margin-right:0.03588em;">ω</span></span></span></span>:
- the public polynomial
- z
- generic selector
- poseidon selector
- the 15 register/witness
- 6 sigmas evaluations (the last one is not evaluated)
1. Sample <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7519em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.03588em;">v</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7519em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span></span></span></span></span></span></span></span> with the Fr-Sponge.
1. Derive <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal" style="margin-right:0.03588em;">v</span></span></span></span> from <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7519em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.03588em;">v</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7519em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span></span></span></span></span></span></span></span> using the endomorphism (TODO: specify).
1. Sample <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7519em;"></span><span class="mord"><span class="mord mathnormal">u</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7519em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span></span></span></span></span></span></span></span> with the Fr-Sponge.
1. Derive <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal">u</span></span></span></span> from <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7519em;"></span><span class="mord"><span class="mord mathnormal">u</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7519em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span></span></span></span></span></span></span></span> using the endomorphism (TODO: specify).
1. Create a list of all polynomials that have an evaluation proof.
1. Compute the evaluation of <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.10764em;">f</span><span class="mord mathnormal">t</span><span class="mopen">(</span><span class="mord mathnormal" style="margin-right:0.07378em;">ζ</span><span class="mclose">)</span></span></span></span>.
#### Partial verification
For every proof we want to verify, we defer the proof opening to the very end.
This allows us to potentially batch verify a number of partially verified proofs.
Essentially, this steps verifies that <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.10764em;">f</span><span class="mopen">(</span><span class="mord mathnormal" style="margin-right:0.07378em;">ζ</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal">t</span><span class="mopen">(</span><span class="mord mathnormal" style="margin-right:0.07378em;">ζ</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">∗</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07153em;">Z</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3283em;"><span style="top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.08125em;">H</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathnormal" style="margin-right:0.07378em;">ζ</span><span class="mclose">)</span></span></span></span>.
1. Commit to the negated public input polynomial.
1. Run the [Fiat-Shamir argument](#fiat-shamir-argument).
1. Combine the chunked polynomials' evaluations
(TODO: most likely only the quotient polynomial is chunked)
with the right powers of <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07378em;">ζ</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.6644em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">n</span></span></span></span></span></span></span></span></span></span></span> and <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen">(</span><span class="mord mathnormal" style="margin-right:0.07378em;">ζ</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">∗</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.03588em;">ω</span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.6644em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">n</span></span></span></span></span></span></span></span></span></span></span>.
4. Compute the commitment to the linearized polynomial <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.10764em;">f</span></span></span></span>.
To do this, add the constraints of all of the gates, of the permutation,
and optionally of the lookup.
(See the separate sections in the [constraints](#constraints) section.)
Any polynomial should be replaced by its associated commitment,
contained in the verifier index or in the proof,
unless a polynomial has its evaluation provided by the proof
in which case the evaluation should be used in place of the commitment.
1. Compute the (chuncked) commitment of <span class="katex"><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.10764em;">f</span><span class="mord mathnormal">t</span></span></span></span>
(see [Maller's optimization](../crypto/plonk/maller_15.html)).
1. List the polynomial commitments, and their associated evaluations,
that are associated to the aggregated evaluation proof in the proof:
- recursion
- public input commitment
- ft commitment (chunks of it)
- permutation commitment
- index commitments that use the coefficients
- witness commitments
- coefficient commitments
- sigma commitments
- lookup commitments
#### Batch verification of proofs
Below, we define the steps to verify a number of proofs
(each associated to a [verifier index](#verifier-index)).
You can, of course, use it to verify a single proof.
1. If there's no proof to verify, the proof validates trivially.
1. Ensure that all the proof's verifier index have a URS of the same length. (TODO: do they have to be the same URS though? should we check for that?)
1. Validate each proof separately following the [partial verification](#partial-verification) steps.
1. Use the [`PolyCom.verify`](#polynomial-commitments) to verify the partially evaluated proofs.
## Optimizations
* `commit_evaluation`: TODO
## Security Considerations
TODO
Universal Reference String (URS)
Note that the URS is called SRS in our codebase at the moment. SRS is incorrect, as it is used in the presence of a trusted setup (which kimchi does not have). This needs to be fixed.
The URS comprises of:
Gs: an arbitrary ordered list of curve points that can be used to commit to a polynomial in a non-hiding way.H: a blinding curve point that can be used to add hidding to a polynomial commitment scheme.
The URS is generated deterministically, and thus can be rederived if not stored.
As different circuits of different size do not share the same domain, and the size of the domain is directly equal to the number of Gs curve points we use to commit to polynomials, we often want to share the same URS of maximum size across many circuits (constraint systems).
Circuit associated with smaller domains will simply use a truncation of the shared URS.
A circuit’s domain is generated as the first power of 2 that is equal, or larger, to the number of gates used in the circuit (including zero-knowledge gates).
Group map
TODO: specify this part
Curve points to commit to a polynomial
The following pseudo-code specifies how the Gs curve points, as well as the H curve point, are generated.
def create(depth):
# generators
Gs = []
for i in range(0, depth):
digest = Blake2b512.hash(be_encode(i))
Gs.push(point_of_random_bytes(digest))
# blinding point
digest = (b"srs_misc" || be_encode(0))
H = point_of_random_bytes(digest)
#
return (Gs, H)
def point_of_random_bytes(random_bytes):
# packing in bit-representation
const N: usize = 31
let mut bits = [false; 8 * N]
for i in range(0, N) {
for j in range(0, 8) {
bits[8 * i + j] = (random_bytes[i] >> j) & 1 == 1;
}
}
let n = BigInt::from_bits_be(&bits);
let t = G::BaseField::from_repr(n)
return map.to_group(t)
TODO: specify map.to_group in the previous section.
URS values in practice
As there is no limit to the number of commitment curve points you can generate, we only provide the first three ones to serve as test vectors.
TODO: specify the encoding
Vesta
Gs.
G0 = (x=121C4426885FD5A9701385AAF8D43E52E7660F1FC5AFC5F6468CC55312FC60F8, y=21B439C01247EA3518C5DDEB324E4CB108AF617780DDF766D96D3FD8AB028B70)
G1 = (x=26C9349FF7FB4AB230A6F6AEF045F451FBBE9B37C43C3274E2AA4B82D131FD26, y=1996274D67EC0464C51F79CCFA1F511C2AABB666ABE67733EE8185B71B27A504)
G2 = (x=26985F27306586711466C5B2C28754AA62FE33516D75CEF1F7751F1A169713FD, y=2E8930092FE6A18B331CE0E6E27B413AA18E76394F18A2835DA9FAE10AA3229D)
H:
H = (x=092060386301C999AAB4F263757836369CA27975E28BC7A8E5B2CE5B26262201, y=314FC4D83AE66A509F9D41BE6165F2606A209A9B5805EE85CE20249C5EBCBE26)
Pallas
G0 = (x363D83141FD1E0540718FADBA7278ABAEEDB46D7A3F050F2CFF1DF4F300C9C30, y=034C68F4079B4F338A19BE2D7BFA44B395C65B9790DD273F361327446C778764)
G1 = (x2CC40B77D87665244AE5EB5304E8744004C80061AD08476A0F0656C13134EA45, y=28146EC860159DB55CB5EA5B14F0AA2F8751DEDFE0DDAFD1C313B15575C4B4AC)
G2 = (x2808BC21BEB90314377BF6130285FABE6CE4B8A4457FB25BC95EBA0083DF27E3, y=1E04E53DD6395FAB8018D7FE98F9C7FAB39C40BFBE48589626A7B8532728B002)
H:
H = (x221B959DACD2052AAE26193FCA36B53279866A4FBBAB0D5A2F828B5FD7778201, y=058C8F1105CAE57F4891EADC9B85C8954E5067190E155E61D66855ACE69C16C0)
Pickles
Pickles is Mina’s inductive zk-SNARK composition system. It lets you construct proofs with zk-SNARKs and combine them in flexible ways to deliver incremental verifiable computation.
Pickles uses a pair of amicable curves called Pasta in order to deliver incremental verifiable computation efficiently.
These curves are referred to as “tick” and “tock” within the Mina source code.
- Tick - Vesta (a.k.a. Step), constraint domain size [block and transaction proofs]
- Tock - Pallas (a.k.a. Wrap), constraint domain size [signatures]
The Tock prover does less (only performs recursive verifications and no other logic), so it requires fewer constraints and has a smaller domain size. Internally Pickles refers to Tick and Tock as Step and Wrap, respectively.
Tock is used to prove the verification of a Tick proof and outputs a Tick proof. Tick is used to prove the verification of a Tock proof and outputs a Tock proof. In other words,
Both Tick and Tock can verify at most 2 proofs of the opposite kind, though, theoretically more is possible.
Currently, in Mina we have the following situation.
- Every Tock always wraps 1 Tick proof, such as
- Tick proofs can verify 2 Tock proofs
- Blockchain SNARK takes previous blockchain SNARK proof and a transaction proof
- Verifying two Tock transaction proofs
Consensus
- vrf is appendix C of https://eprint.iacr.org/2017/573.pdf